Transportation inequalities for Markov kernels and their applications

We study the relationship between functional inequalities for a Markov kernel on a metric space $X$ and inequalities of transportation distances on the space of probability measures $\mathcal{P}(X)$. We extend results of Luise and Savar\'e on contraction inequalities for the heat semigroup on $\mathcal{P}(X)$ when $X$ is an $RCD(K,\infty)$ metric space, with respect to the Hellinger and Kantorovich--Wasserstein distances, and explore applications to more general Markov kernels satisfying a reverse Poincar\'e inequality. A key idea is a ``dynamic dual'' formulation of these transportation distances. We also modify this formulation to define a new family of divergences on $\mathcal{P}(X)$ which generalize the R\'enyi divergence, and relate them to reverse logarithmic Sobolev inequalities. Applications include results on the convergence of Markov processes to equilibrium, and on quasi-invariance of heat kernel measures in finite and infinite-dimensional groups.


Introduction
The goal of this paper is to build upon recent results of G. Luise and G. Savaré [23] on contraction properties of the flow of a heat semigroup in spaces of measures. There, the authors study a "dynamic dual" formulation of various distances between probability measures on a metric measure space, including the Kantorovich-Wasserstein and Hellinger distances as well as a family of Hellinger-Kantorovich distances HK α introduced in [22]. They focus on the setting of RCD(K, ∞) spaces, in which the canonical heat semigroup P t generated by the Cheeger energy satisfies a Bakry-Émery curvature condition; these spaces are, very roughly speaking, more general analogues of Riemannian manifolds with Ricci curvature bounded from below. Under this assumption, they obtain contraction inequalities of the form He 2 (µ 0 P t , µ 1 P t ) ≤ HK α(t) (µ 0 , µ 1 ) (1.1) where µP t denotes the dual action of the heat semigroup P t on measures, He and HK are the Hellinger and Hellinger-Kantorovich distances respectively, and α(t) depends on K.
The first goal of the present paper is to observe that the techniques of [23] are not limited to the setting of RCD(K, ∞) spaces. In obtaining (1.1), the key ingredient is the fact that RCD(K, ∞) spaces satisfy a reverse Poincaré inequality of the form Indeed, the inequality (1.2), with its specific form of the time-dependent constant K e 2Kt −1 , is equivalent to the Bakry-Émery curvature condition; see for instance [2,Proposition 3.3]. However, there are many interesting cases where RCD(K, ∞) is not satisfied, or P t is something other than the canonical heat semigroup, yet one still has reverse Poincaré inequalities of the form where now C(t) may take some different form than in (1.2). We discuss several examples in Section 5, including semigroups which are non-local, non-symmetric, or non-elliptic.
In such settings, the dynamic dual formulation makes it easy to show that (1.3) implies a corresponding Hellinger-Kantorovich contraction statement similar to (1.1). In fact, in this paper we consider a wider family of functional inequalities, including both Poincaré and reverse Poincaré inequalities, and we show that each implies a corresponding transportation cost inequality. This is carried out in Section 3. These implications rely on very little beyond the Markovian property of the operators P t ; it is not necessary that P t be the semigroup whose carré du champ is given by |∇f | 2 , nor even that P t be a semigroup at all.
In Section 4, we extend the results of [23] in a different direction. By introducing a logarithmic term in the dynamic dual definition of the Hellinger-Kantorovich distance, we obtain a new family of "entropic" divergences T a,b which generalize the Rényi divergence. We then show that contraction results with respect to these divergences are related to (reverse) logarithmic Sobolev inequalities.
In Sections 5 and 6, we discuss a number of examples of spaces where these techniques apply, beyond the RCD(K, ∞) setting discussed in [23], and applications to questions such as convergence to equilibrium and quasi-invariance of measures. These examples include Langevin dynamics driven by Lévy processes, semigroups arising in sub-Riemannian geometry, infinite-dimensional spaces modeled on abstract Wiener space, and others.
Since the focus of this paper is on techniques and their applications, we have not attempted to state the results in the greatest possible generality, or to describe the most minimal abstract conditions under which the statements hold. We prefer instead to work in more concrete settings which more clearly illustrate the ideas, with the expectation that readers will be able to adapt those ideas to other settings as needed.
Acknowledgments. The authors are grateful for helpful discussions with Maria Gordina, Martin Hairer, Kazumasa Kuwada, Xue-Mei Li, and Giuseppe Savaré. This article was completed during a sabbatical visit by author N. Eldredge to the Department of Mathematics at the University of Connecticut; he would like to thank the Department and especially Maria Gordina for their hospitality, especially in view of the difficult circumstances created by the COVID-19 pandemic.

General setup and notation
Throughout the paper, unless otherwise specified, (X, d) denotes a complete, proper, separable metric space which is a length space (in particular, path connected), equipped with a strong upper gradient |∇f | as defined in [1, Definition 1.2.1]. More precisely, for a measurable function f : and denote Lip b (X) the space of all bounded Lipschitz functions on X. Then, we have the following result: Lemma 2.1 (Proposition 1.11, [14]) For every f ∈ Lip b (X), |∇f | is a strong upper gradient in the sense that for each rectifiable curve γ : [0, L] → X parametrized by arc-length we have One may also verify that |∇f | satisfies the chain rule.
Let B X denote the Borel σ-algebra of (X, d), and P(X) the set of Borel probability measures on X. We suppose we are given a Markov probability kernel P : X × B X → [0, 1], and we denote by P f , µP the usual action of P on bounded Borel functions f and Borel probability measures µ, i.e.
In some applications, P will be taken to be a Markov semigroup P t , which may or may not be symmetric with respect to some reference measure m. Our setting is similar to [20]. This is more general than the setting of [23], which only considered the symmetric semigroup P t generated by the Cheeger energy with respect to the given gradient and a reference measure.
Given µ 0 , µ 1 ∈ P(X), the 2-Kantorovich-Wasserstein distance W 2 (µ 0 , µ 1 ) is defined as usual by the infimum taken over all couplings µ ∈ P(X × X) of µ 0 , µ 1 . In particular, for point masses µ i = δ x i , we have W 2 (δ x 0 , δ x 1 ) = d(x 0 , x 1 ). We let P 2 (X) ⊂ P(X) denote the Wasserstein space of probability measures µ having a finite second moment, i.e. for which X d(x, x 0 ) 2 µ(dx) < ∞ for some (equivalently, all) x 0 ∈ X. The 2-Hellinger distance is defined by where m is any measure such that µ 0 , µ 1 are both absolutely continuous to m; the definition is independent of m. Convergence in Hellinger distance is equivalent to convergence in total variation, and we have He(µ 0 , µ 1 ) ≤ 2 with equality iff µ 0 , µ 1 are mutually singular. In Section 6 we depart somewhat from this setting to consider infinite-dimensional examples based on abstract Wiener space, where X is a separable Banach space (which is not proper), the test functions are taken to be the cylinder functions instead of all bounded Lipschitz functions, and the gradient ∇ is derived from the Malliavin gradient, whose norm is not an upper gradient with respect to the norm distance on X. This will require only slight modifications to the arguments in the earlier sections; we discuss the details in Section 6.

The dynamic dual formulation and basic properties
In this section, we consider the family of Hellinger-Kantorovich distances studied in [21,22,23]. We focus on the so-called dynamic dual formulation of these distances, in which they may be defined as the supremum of a difference of integrals over a class of subsolutions of a Hamilton-Jacobi-type equation in time and space variables. This idea is directly descended from a dynamic dual formulation of the Kantorovich-Wasserstein distance, introduced in [24]. Using this formulation of these distances, we see that Poincaré and reverse Poincaré type inequalities for P lead directly to contraction results with respect to these distances.
We study the Hellinger-Kantorovich distance via a slightly different parametrization which is more convenient for our purposes. As above, let Lip b (X) denote the Banach space of all bounded Lipschitz functions on X. We remark for later use that for any finite measure µ on X, we have Lip b (X) dense in L 1 (µ), and in particular that for any bounded Borel function f there is a sequence f n ∈ Lip b (X) with f n → f µ-a.e. Definition 3.1 Let a, b ≥ 0. We denote by A a,b the class of all functions ϕ ∈ C 1 ([0, 1], Lip b (X)) satisfying the differential inequality Then for probability measures µ 1 , µ 2 ∈ P(X) we set Lemma 3.2 The distances W a,b satisfy the following basic properties: (ii) For any c > 0, we have W ca,cb = c −1 W a,b .
where He 2 is the Hellinger 2-distance.
. Item (ii) holds because ϕ ∈ A ca,cb if and only if cϕ ∈ A a,b . For item (iii), in the notation of [23,Eq. (39)] (see also [22,Section 8.4]), we have HK 2 α = W α/4,1 , and the general statement follows using item (ii). Item (iv) can be found as Proposition 2.10 of [23], but goes back at least as far as [24,Section 3]; see also other references in [23].
Thus, the distances W a,b naturally interpolate between the Kantorovich-Wasserstein distance, perhaps the most familiar transportation distance, and the Hellinger distance, which metrizes convergence in total variation. As will be seen in the next subsection, this makes it valuable for obtaining inequalities relating these two distances. Proposition 3.3 If x 0 , x 1 ∈ X and δ x 0 , δ x 1 ∈ P(X) are the corresponding Dirac measures, then Proof. For a = 1 2 , b = 2, this is [22, Eq. (6.31)]; see also [22,Section 8] for the explanation that the LET distance corresponds to HK 2 , which is our W 1/2,2 . Other values of a can be handled by rescaling the distance d, and general values of a, b are then covered by Lemma 3.2 (ii).
We note, however, that the upper bound can be shown much more easily, and is comparable to the exact expression up to a universal constant multiple (whose value is something like 1.2). The upper bound W a,b (µ 0 , µ 1 ) ≤ 2 b is essentially trivial, and can be seen, for instance, by noting He 2 2 and that He 2 2 (µ 0 , µ 1 ) ≤ 2 for all µ 0 , µ 1 . The upper bound W a,b (δ x 0 , δ x 1 ) ≤ 1 4a d(x 0 , x 1 ) 2 can be seen in a similar way by comparing to the Kantorovich-Wasserstein distance W 1/2,0 . But it can also be shown directly from the "dynamic dual" definition of W a,b . We give the argument here as we shall use a similar argument in Proposition 4.8 below, and we shall also wish to adapt it to infinite-dimensional settings in which the assumptions of this section are not quite satisfied.
Let a > 0 and b ≥ 0. Recall that (X, d) is assumed to be a length space, so there exists a constant speed geodesic γ : [0, 1] → X joining x 0 to x 1 : namely, γ 0 = x 0 , . Now using the chain rule, we have by completing the square. Discarding the two negative terms and taking the supremum over ϕ s ∈ A a,b , we recover the desired bound.

Functional inequalities
Theorem 3.4 Suppose that for some a > 0 and b, γ, δ ≥ 0, the Markov operator P satisfies the functional inequality Then we have the transportation distance contraction where we used the fact that P is positivity preserving, and the assumed inequality (3.3). This shows that P ϕ s ∈ A a,b . Thus for µ 0 , µ 1 ∈ P(X) we have Corollary 3.5 If P satisfies the gradient estimate |∇P f | 2 ≤ CP |∇f | 2 for some C, then for any b ≥ 0 we have In particular, taking b = 0 we recover the Kuwada-type duality The case C = 1 of Corollary 3.5 is [22,Theorem 8.24], and when additionally b = 0 it reduces to [20,Proposition 3.7].
Corollary 3.7 Suppose P satisfies the reverse Poincaré inequality or in other notation In Section 5 below, we study a number of examples which satisfy inequalities of these forms, beyond the RCD(K, ∞) spaces considered in [23].
Interestingly, Corollary 3.7 admits a converse. This will follow from the following lemma: Proof. Let m be a Borel measure such that both δ x P and δ y P are absolutely continuous with respect to m. We denote We have Therefore, by the Cauchy-Schwarz inequality, One deduces the following converse to Corollary 3.7.
Corollary 3.9 Suppose that for every µ 0 , µ 1 ∈ P(X) Proof. Assume that Then, for every x, y ∈ X, Therefore, from Lemma 3.8 one deduces Similarly, one has the conclusion follows from (3.5).

The dynamic dual formulation and basic properties
The notions discussed in the previous section can be modified to give a dynamic dual formulation of a family of divergences on P(X), which we will denote by T a,b . In the same way that the distances W a,b included the Hellinger distance, the T a,b family will include the Rényi divergence, and where contractions of W a,b were related to reverse Poincaré inequalities, we will show that contractions of T a,b are related to reverse logarithmic Sobolev inequalities.
We denote by E a,b the class of all positive functions ϕ ∈ C 1 ([0, 1], Lip b (X)) satisfying the differential inequality Then for probability measures µ 1 , µ 2 ∈ P(X) we set Comparing with Definition 3.1, we note that T a,0 = W a,0 is again the (rescaled) Kantorovich-Wasserstein distance; the restriction to positive functions here does no harm, because the class of functions A a,0 is invariant under adding constants.
Proof. Since the case b = 0 reduces to the Kantorovich-Wasserstein distance, we suppose b > 0.
We have ψ ln ψ ≤ 0, and since ψ is Lipschitz, |∇ ln ψ| 2 is bounded. Hence by taking ǫ < exp − a b sup |∇ ln ψ| 2 , we can ensure that aϕ|∇ ln ϕ| 2 + bϕ ln ϕ ≤ 0 everywhere and therefore ϕ ∈ E a,b . It follows that When a = 0, this produces the Rényi divergence, as we now show with the aid of two preliminary lemmas.
Now let A ⊂ X be a Borel set with µ 0 (A) = 0, and let 0 < f n < 1 be a sequence of Lipschitz functions converging µ 1 , µ 2 -a.e. to 1 A . Replacing f by f n in (4.3) and letting n → ∞, we obtain Since c > 0 was arbitrary, this is only possible if µ 1 (A) = 0.
Lemma 4.4 Let 1 p + 1 q = 1 and fix z > 0. Then for all x > 0 we have Proof. Using Young's inequality for products uv ≤ 1 p u p + 1 q v q , we have Young's inequality becomes equality precisely when u p = v q , which in this case means Proof. Suppose ϕ s ∈ E 0,b and let f = ϕ 0 . Since the solution of the initial value problem y ′ + (ln p)y ln y = 0, To get the reverse inequality, let f ∈ Lip b (X) and set ϕ s = f p −s , so that as noted above we have ∂ s ϕ s + (ln p)ϕ s ln ϕ s = 0. Hence ϕ ∈ E 0,b , so Now replacing f by a sequence f n ∈ Lip b (X) such that f n → (̺/p) q , µ 0 -a.e., by Lemma 4.4 and Fatou's lemma we have as desired.
Corollary 4.6 Following the notation of the previous proposition, we have where D q (µ 1 µ 0 ) := 1 q−1 ln X dµ 1 dµ 0 q dµ 0 is the Rényi divergence of order q.
For later use, we record here an estimate for the T a,b divergence between two point masses. We need another elementary inequality first, which is easily proved via calculus.
Lemma 4.7 For every α, β > 0 and every x ≥ 0, we have Proof. Suppose ϕ ∈ E a,b , and as in the proof of the upper bound in Proposition 3.3, let γ : [0, 1] → X be a constant speed geodesic joining x 0 to x 1 . Using the chain rule, we have by completing the square. Now we conclude by applying Lemma 4.7 and taking the supremum over ϕ ∈ E a,b .

Functional inequalities
The key relationship between functional inequalities and contractions of the T a,b divergence is as follows.
Proof. By the chain rule (Lemma 2.2), we have where we used the bivariate Jensen inequality with the convex function ψ(x, y) = x 2 /y. This is (4.7) with a = 1, b = δ = 0, and γ = C 2 , so the conclusion follows from Theorem 4.9.
In particular, when (4.9) holds and µ i = δ x i are point masses, we can combine Corollary 4.6, Proposition 4.8 and Corollary 4.11 to obtain where as before p = e C and 1 p + 1 q = 1.
In this section, we focus on the applications of the transportation type inequalities proven in Theorem 3.4 as a powerful tool to prove convergence to equilibrium for Markov semigroups. We will mostly focus on the applications of the transportation inequality which, according to Corollary 3.7, comes from the gradient type bound: The original Kuwada duality proved in Corollary 3.5 relating the transportation inequality was already illustrated as a tool to prove convergence to equilibrium in [3], so we will spend less time on it.

Diffusions with Γ 2 ≥ 0
Let ∆ be a locally subelliptic diffusion operator (see Section 1.2 in [4] for a definition of local subellipticity) on a smooth manifold M. For smooth functions f, g : M → R, we can define the carré du champ operator as the symmetric first-order bilinear differential form given by: We assume that ∆ is symmetric with respect to some smooth measure µ, which means that for every smooth and compactly supported functions f, g ∈ C ∞ 0 (M), There is an intrinsic distance associated to the operator ∆ that we now describe. An absolutely continuous curve γ : [0, T ] → M is said to be subunit for the operator L if for every smooth function f : M → R we have d dt f (γ(t)) ≤ (Γf )(γ(t)). We then define the subunit length of γ as ℓ s (γ) = T . Given x, y ∈ M, we indicate then with and assume that S(x, y) = ∅ for every x, y ∈ M. For instance, if L is an elliptic operator or if L is a sum of squares operator that satisfies Hörmander's condition, then this assumption is satisfied. Under such assumption d(x, y) = inf{ℓ s (γ) | γ ∈ S(x, y)}, We assume that the metric space (M, d) is complete. In that case, from Propositions 1.20 and 1.21 in [4], the operator ∆ is essentially self-adjoint on C ∞ 0 (M). The semigroup in L 2 (M, µ) generated by ∆ will be denoted by (P t ) t≥0 . The Bakry Γ 2 operator is defined as Theorem 5.1 Assume that for every f ∈ C ∞ (M), Γ 2 (f, f ) ≥ 0. Then, for every ν 1 , ν 2 ∈ P 2 (M) and t > 0, Therefore, if µ is a probability measure which belongs to P 2 (M), then for every x ∈ M and t > 0, and when t → +∞, δ x P t converges to µ in total variation for every x ∈ M.
Proof. It follows from Bakry-Émery calculus that since Γ 2 ≥ 0 one has the following gradient bound that holds for bounded and Lipschitz functions f , In particular, this yields and thus the conclusion thanks to Theorem 3.4.
Example 5.2 An example where the theorem applies is the case where ∆ is the Laplace-Beltrami operator on a complete Riemannian manifold. In that case, the invariant measure µ is the Riemannian volume measure and the assumption Γ 2 ≥ 0 is equivalent to the fact that the Ricci curvature of M is non-negative.

Remark 5.3
If Γ 2 ≥ a, then, Bakry-Émery calculus also yields the gradient bound which therefore implies from Theorem 3.4 the following contraction property in the W 2 distance: This appears in [20] and [26].

Subelliptic operators
The assumption Γ 2 ≥ 0 requires some form of ellipticity of ∆. In order to generalize the previous theorem to truly subelliptic operators, one can make use of the generalized Γ-calculus developed in [5,8]. In addition to the differential form (5.1), we assume that M is endowed with another smooth symmetric bilinear differential form, indicated with Γ Z , satisfying for f, g ∈ C ∞ (M) and Γ Z (f ) = Γ Z (f, f ) ≥ 0. Let us assume that: (H.1) There exists an increasing sequence h k ∈ C ∞ 0 (M) such that h k ր 1 on M, and

Let us then consider
As for Γ and Γ Z , we will freely use the notations Assume that for every f ∈ C ∞ (M) and ν > 0 Then, for every ν 1 , ν 2 ∈ P 2 (M) and t > 0, Therefore, if µ is a probability measure which belongs to P 2 (M), then for every x ∈ M and t > 0, and when t → +∞, δ x P t converges to µ in total variation for every x ∈ M.
Proof. It follows from Proposition 3.2 in [5] that and thus the conclusion follows from Corollary 3.7.
Example 5.5 An example where this theorem applies is the case where ∆ is the sub-Laplacian operator on a compact H-type sub-Riemannian manifold, see [10]. In that case, the invariant measure µ is again the Riemannian volume measure and the assumption (5.4) is equivalent to the fact that the horizontal Ricci curvature of M is non-negative. This applies for instance to the sub-Laplacian on the special unitary group SU(2), as well as to compact quotients of the Heisenberg group H 3 .

Non symmetric Ornstein-Uhlenbeck semigroups on Carnot groups
In this section, we show that the method also applies to hypoelliptic and non-symmetric diffusion operators. In particular we prove quantitative rate of convergence for the Ornstein-Uhlenbeck semigroups on Carnot groups. A Carnot group of step (or depth) N is a simply connected Lie group G whose Lie algebra can be written From the above properties, it is of course seen that Carnot groups are nilpotent. The number is called the homogeneous dimension of G. On g we can consider the family of linear operators which act by scalar multiplication t i on V i . These operators are Lie algebra automorphisms due to the grading and induce Lie group automorphisms ∆ t : G → G which are called the canonical dilations of G. It is easily seen that there exists on G a complete and smooth vector field D such that This vector field D is called the dilation vector field on G. If X is a left (or right) invariant smooth horizontal vector field on G, we have for every f ∈ C ∞ (G), and t ≥ 0, Let us now pick a basis V 1 , ..., V d of the vector space V 1 . The vectors V i 's can be seen as left invariant vector fields on G. In the sequel, these vector fields shall still be denoted by V 1 , ..., V d . The left invariant sub-Laplacian on G is the operator: It is essentially self-adjoint on the space of smooth and compactly supported function with the respect to the Haar measure µ of G. The heat semigroup (P t ) t≥0 on G generated by the sub-Laplacian, defined through the spectral theorem, is then a Markov semigroup. We are interested here in the non-symmetric Ornstein Uhlenbeck operator defined by where α > 0. This operator generates a Markov semigroup (Q t ) t≥0 which is given by the Mehler's formula It is clear that the probability measure δ e P 1/α is invariant by Q t where e denotes the identity element in G. Note that δ e P 1/α is the heat kernel measure started from e in G. From known heat kernel estimates in Carnot groups (see [25]), one easily sees that the invariant measure δ e P 1/α ∈ P 2 (G). The next theorem proves exponentially fast convergence to equilibrium for Q t with a quantitative rate.

Theorem 5.6
For every x ∈ G and t > 0, Proof. We denote by ∇ H the horizontal gradient on G given by The following gradient bound was proved in [6]: . Thus, Q t (f 2 ) and the conclusion follows as before from Corollary 3.7.

Langevin type dynamics driven by Lévy processes
In this subsection, we work in the space X = R n with its usual Euclidean distance and gradient. Let (N t ) t≥0 be a Lévy process in R n , i.e. a càdlàg stochastic process with stationary and independent increments. We assume that N 0 = 0 a.s. and that for every T > 0, E sup t∈[0,T ] N t 2 < +∞. In R n , we consider the following stochastic differential equation with additive noise: where U : R n → R is a C 2 function. For simplicity, we assume that ∇U is a Lipschitz function, so that it is easily proved that (5.5) has a unique solution for any x ∈ R n which moreover satisfies for every T > 0, E sup t∈[0,T ] X x t 2 < +∞. For t ≥ 0, we denote by P t the Markov kernel defined by It is a contraction semigroup in L ∞ (R n ), and from the square integrability we have that for every µ ∈ P 2 (R n ) and t ≥ 0, µP t ∈ P 2 (R n ).

Convergence to equilibrium in the Kantorovich-Wasserstein distance
Let ∇ 2 U denote the Hessian of U.
Theorem 5.8 Assume that there exists a > 0 such that ∇ 2 U ≥ a (uniformly in the sense of quadratic form). Then, there exists a unique probability measure µ in the Wasserstein space P 2 (R n ) such that for every t ≥ 0, µP t = µ. Moreover, for every t ≥ 0, and ν ∈ P 2 (R n ) one has, Proof. We proceed in several steps.
Step 1: Proving the Bakry-Émery type estimate. Let J t = ∂X x t ∂x be the first variation process associated with equation (5.5). Since P t f (x) = E(f (X x t )), by the chain rule we have Therefore, by the Cauchy-Schwarz inequality, Since E (|∇f (X x t )| 2 ) = P t (|∇f | 2 )(x), we are left to estimate E (|J * t | 2 ). To this end, we observe that From the assumption ∇ 2 U ≥ a this yields One concludes E (|J * t | 2 ) ≤ e −2at and therefore By Kuwada duality (Corollary 3.5), this yields that for every ν 0 , ν 1 ∈ P 2 (R n ), Step 2: Proving the existence and uniqueness of the invariant measure.
Let t > 0. Thanks to (5.7), the map ν → νP t is a contraction from P 2 (R n ) into itself. Since P 2 (R n ) is a complete metric space, one deduces that it admits a unique fixed point; call it µ t . We have then for every t > 0 that µ t P t = µ t . Composing with P s yields µ t P t P s = µ t P s . Since P t is a semigroup, one has P t P s = P s P t . Therefore, µ t P s P t = µ t P s which means that µ t P s is invariant for P t . By uniqueness this implies µ t P s = µ t . Using now the uniqueness of the invariant measure for P s yields µ t = µ s . As a conclusion, µ t is independent of t. We can call it µ.

Convergence to equilibrium in the Hellinger distance
Our next application shows that in the diffusion case one can prove convergence to equilibrium in the Langevin dynamics without assuming coercivity of the Hessian of the potential (i.e. ∇ 2 U ≥ a > 0). The price to pay is a convergence speed which is not exponential but polynomial. We now assume that (N t ) t≥0 is a Brownian motion in R n . In that case, the invariant measure of (5.5) is known explicitly, and is given up to a possible normalization constant by e −U (x) dx.
Theorem 5.9 Assume that the normalized invariant measure dµ = 1 Z e −U (x) dx is a probability measure with a finite second moment and that ∇ 2 U ≥ 0 (U convex). Then, for every x ∈ R n In particular, X x t converges in total variation to µ when t → +∞.
Proof. From the Bismut-Elworthy-Li formula we have for every v ∈ R n where, as before, J t = ∂X x t ∂x is the first variation process associated with equation (5.5). From Cauchy-Schwarz inequality, and the fact that One concludes that for every v ∈ R n , This yields and thus the expected result by Corollary 3.7.
Remark 5.10 Theorem 5.9 might also be proven using Theorem 5.1 above. However, we wanted to illustrate the use of the Bismut-Elworthy-Li formula as a tool to prove reverse Poincaré inequalities.

Applications to quasi-invariance
In this section, we focus on the applications of the duality between the reverse log-Sobolev inequality and the Rényi divergence estimate, described in Corollary 4.11. As shown in Lemma 4.3, we have T 0,b (µ 0 , µ 1 ) < ∞ only if µ 0 , µ 1 are mutually absolutely continuous, so this makes Corollary 4.11 a convenient tool for proving absolute continuity of measures. Specifically, suppose X = G is a complete separable metric group, with identity e. (In this section, we drop the assumption that G is a proper metric space, and it need not be locally compact.) Suppose the Markov kernel P is left invariant with respect to the group translation, and let µ = δ e P . Then for x ∈ G, µ x := δ x P is the left translation of µ by x. If µ x and µ are mutually absolutely continuous, we say that µ is quasi-invariant under left translation by x. This is an important regularity property of the measure µ. For instance, if G is locally compact and µ is quasi-invariant under all left translations (i.e. H = G), then µ is absolutely continuous to the Haar measure of G.
In this section, we show how reverse logarithmic Sobolev inequalities can be used to prove quasi-invariance statements via the T a,b divergences. The results discussed in this section are not new and can be obtained by other methods, some classical and some more recent. Our main purpose here is rather to illustrate the techniques and explore this interesting variant route to the theorems we discuss. However, we do expect that these ideas may be useful for new results going forward, particularly for applications in stochastic PDE, which we hope to explore in future work.

Subelliptic heat kernels on finite-dimensional Lie groups
Let G be a finite-dimensional connected real Lie group, and suppose that G is equipped with a left-invariant sub-Riemannian geometry: a bracket-generating left-invariant sub-bundle H ⊂ T G, and a sub-Riemannian metric g which is a left-invariant inner product on H. We denote by ∇ the horizontal sub-gradient, and |∇f | := g(∇f, ∇f ). Let d be the Carnot-Carathéodory distance on G. Let L be the left-invariant sub-Laplacian induced by g, P t = e tL the heat semigroup generated by L, and µ t = δP t the heat kernel measure.
Under these conditions, Hörmander's theorem implies that L is subelliptic and hence µ t is a smooth measure for all t > 0. Our purpose here is to remark that at least part of this conclusion can be recovered using our techniques instead, if one has a reverse log Sobolev inequality. Proposition 6.1 Suppose, under the above assumptions, that P t satisfies the reverse logarithmic Sobolev inequality Then µ t is quasi-invariant under translation by every x ∈ G. As such, µ t is absolutely continuous to Haar measure and has full support.
Proof. This follows immediately from Corollary 4.11, Proposition 4.8 and Lemma 4.3. Note that a key fact is that d(, x) < ∞ for every x ∈ G, as a consequence of the Chow-Rashevskii theorem.
By the results in [5], the reverse log Sobolev inequality holds in sub-Riemannian manifolds satisfying a generalized curvature-dimension inequality of the type introduced in [8]. It was shown in [8] that such inequalities hold for step two Carnot groups and the three-dimensional model groups SU(2) and SL (2), and in [7] for threedimensional solvable groups.

Abstract Wiener space
The phenomenon of quasi-invariance is more interesting in groups that are not locally compact, where the regularity of a measure cannot be described in terms of absolute continuity to Haar measure, since Haar measure does not exist.
In this subsection, we consider the very classical example of abstract Wiener space. As this and similar infinite-dimensional models do not fit exactly into the setting defined in Section 2, we shall briefly discuss how to adapt the results of Sections 3 and 4 in this case, as a prototype for later examples. We give basic definitions here to fix notation; for further background on abstract Wiener space and Gaussian measures on infinite-dimensional spaces, we refer to [11,19].
An abstract Wiener space consists of a real separable Banach space W equipped with a centered non-degenerate Gaussian Borel measure µ. We denote by H ⊂ W the associated dense Cameron-Martin space, into which the continuous dual W * is naturally embedded. A smooth cylinder function is a function F : W → R of the form is a smooth function with all partial derivatives bounded, and f 1 , . . . , f n ∈ W * ⊂ H; unless otherwise specified, we assume without loss of generality that f 1 , . . . , f n are orthonormal in H. We let Cyl(W ) denote the space of all such functions; it is a standard fact that Cyl(W ) is dense in L p (µ) for 1 ≤ p < ∞. The Malliavin gradient DF : W → H of a cylinder function is defined by . . , f n (x))| 2 = |∇ϕ(f 1 (x), . . . , f n (x))| 2 .
Note that DF H is not a strong upper gradient on W .
The heat semigroup P t on W is the convolution semigroup induced by the rescaled measure µ, namely P t F (x) = F (x + √ ty) µ(dy). When F is a cylinder function F (x) = ϕ(f 1 (x), . . . , f n (x)), we have P t F (x) = p t ϕ(f 1 (x), . . . , f n (x)) where p t is the standard heat semigroup on R n ; in particular, P t F is again a cylinder function.
We recall that p t satisfies the reverse Poincaré inequality and the reverse logarithmic Sobolev inequality These follow, for instance, by standard Γ-calculus from the elementary commutation ∇p t ϕ = p t ∇ϕ. See for instance [2, Proposition 3.3], taking ρ = 0. Note that the constants in these inequalities are dimension-independent. As such, evaluating at (f 1 (x), . . . , f n (x)), x ∈ W , we obtain the corresponding inequalities for P t on (W, µ): and define the distances W a,b , T a,b accordingly on P(W ). We have W 0,b and T 0,b related to Hellinger and Rényi distances in the same way as before. Moreover we can follow the proof of Proposition 4.8, taking γ(s) = sx 1 + (1 − s)x 0 and noting Now Theorem 4.9 allows us to recover the classical Cameron-Martin quasi-invariance theorem [13]. For t > 0, let µ t = µ(t −1/2 ·) = δ 0 P t be the rescaling of the Gaussian measure µ, and for h ∈ H let µ h t = µ(t −1/2 (· − h)) = δ h P t be its translation by h. We have: Proposition 6.2 (Cameron-Martin theorem) For all t > 0 and h ∈ H, the measures µ t , µ h t are mutually absolutely continuous.
2, so to work around this, choose an integer n so large that 1 4t n −2 h 2 H < 2. Applying (6.12) with h/n in place of h, we conclude that He 2 2 (µ t , µ h/n t ) < 2 and in particular that µ t , µ h/n t are not mutually singular. By the Feldman-Hajek dichotomy theorem for Gaussian measures, they must therefore be mutually absolutely continuous, which we denote by µ t ∼ µ h/n t . Repeating this argument n times, we have µ t ∼ µ h/n t ∼ µ 2h/n t ∼ · · · ∼ µ h t , and since ∼ is an equivalence relation, we have µ t ∼ µ h t as desired. Although this argument uses only the reverse Poincaré inequality, which is a priori weaker than the reverse logarithmic Sobolev inequality used in Proposition 6.2, the conclusion is also weaker as it does not yield any quantitative information about the distance between the measures µ t , µ h t . We note that some proofs of the Feldman-Hajek dichotomy theorem, including Feldman's original proof [17,18], make use of the Cameron-Martin quasi-invariance theorem, which would seem to make the above argument circular. However, it is possible to prove the dichotomy theorem directly, without assuming quasi-invariancesee for example [12]-and this breaks the cycle.

Infinite dimensional Heisenberg-like groups
The ideas of the previous two subsections come together in the study of infinitedimensional groups where the semigroup in question is not elliptic. In [9], the authors considered infinite-dimensional Heisenberg-like groups, introduced in [15], with their hypoelliptic heat kernels and corresponding heat semigroups. These groups carry a natural sub-Riemannian geometry analogous to the Heisenberg group and other Carnot groups of step two. They use generalized curvature-dimension inequalities to show that these spaces satisfy a reverse logarithmic Sobolev inequality. From this, they derive a Wang-type Harnack inequality, and use this to show quasi-invariance of the heat kernel measure under the group translation. In this section, we show that as in the case of Gaussian measures, transport inequalities provide an alternate route from reverse log Sobolev to quasi-invariance in this setting. We only sketch the argument here, as the details are closely analogous to those for the Gaussian case.
We follow the notation of [9] and refer the reader there for complete definitions, background, and further references. Let (W, H, µ) be an abstract Wiener space and C a finite-dimensional inner product space. Suppose that g = W × C is equipped with a continuous Lie bracket [·, ·] satisfying [W, W ] = C and [g, C] = 0. The corresponding Banach Lie group G is given by G = W × C equipped with the nonabelian group operation g 1 ·g 2 = g 1 +g 2 + 1 2 [g 1 , g 2 ] defined by the Baker-Campbell-Hausdorff formula. Then g CM = H × C is a dense Lie subalgebra of g, called the Cameron-Martin Lie subalgebra, and likewise G CM = H × C ⊂ G is a dense subgroup of G.
If B t is a standard Brownian motion on (W, µ), we may define a left-invariant Brownian motion g t on G by the formula g t = B t , 1 2 t 0 [B s , dB s ] . Let ν t = Law(g 2t ) be the heat kernel measure induced by g t . By analogy with the finite-dimensional Heisenberg group, one expects the measure ν t to be "smooth" in some sense. One cannot express this smoothness in terms of a density with respect to Lebesgue or Haar measure because the latter do not exist in infinite dimensions, but another reasonable notion of smoothness would be for ν t to be quasi-invariant under left translation by elements of the Cameron-Martin subgroup G CM . The main result of [9] is that this is in fact the case. (We also mention [16] where the same statement was shown through different means, by producing a density of ν t with respect to the measure µ ×m, where µ is the Gaussian measure on W and m is Lebesgue measure on C.) It is shown in [9] that the group G can be approximated by finite-dimensional projection groups G P , each of which is a nilpotent Lie group of step 2. This leads to a notion of smooth cylinder functions F : G → R which can be differentiated in directions X ∈ g CM , and thus a horizontal gradient ∇ H F : G → H can be defined for such functions. If γ : [0, 1] → G CM is an absolutely continuous horizontal path, then its derivative γ ′ can be identified as a curve in H, and we have the chain rule d ds F (γ(s)) = ∇ H F (γ(s)), γ ′ (s) H . Moreover, G CM is a length space with respect to the horizontal distance d CM , and so the estimates on W a,b (δ 0 , δ g ), T a,b (δ 0 , δ g ) from Propositions 3.3 and 4.8 go through for g ∈ G CM , with d = d CM .
Now [9,Proposition 4.8] shows, by means of generalized curvature-dimension inequalities as introduced in [8], that each projection group G P satisfies a reverse logarithmic Sobolev inequality, with a uniform constant of the form C/t where C depends only on the structure of G, and not on P . This can be restated as the following reverse logarithmic Sobolev inequality for cylinder functions on G: and so as in Proposition 6.2 above, we recover a version of the main quasi-invariance result of [9] and [16]: Moreover, the bounds on T 0,b (ν t , ν g t ) as in (6.6) translate into L q bounds on the Radon-Nikodym derivative dν g t /dν, albeit for values of q which depend on t.