A DIFFUSION APPROACH TO STEIN’S METHOD ON RIEMANNIAN MANIFOLDS

. We detail an approach to developing Stein’s method for bounding integral metrics on probability measures deﬁned on a Riemannian manifold M . Our approach exploits the relationship between the generator of a diﬀusion on M having a target invariant measure and its characterising Stein operator. We consider a pair of such diﬀusions with diﬀerent starting points, and through analysis of the distance process between the pair, derive Stein factors, which bound the solution to the Stein equation and its derivatives. The Stein factors contain curvature-dependent terms and reduce to those currently available for R m , and moreover imply that the bounds for R m remain valid when M is a ﬂat manifold.


Introduction
The eponymous method to estimate integral metrics and semi-metrics on spaces of probability measures proposed by Charles Stein [Stein, 1972] has led to tremendous improvements in distributional approximation techniques.See, for example, the surveys by Barbour and Chen [2014] and Ross [2011].The method has mainly been developed for probability measures on R m or N m for m ≥ 1.The focus of this paper is on developing a version of the method that can be employed to approximate probability measures on an m-dimensional Riemannian manifold.
Abstracting Stein's method to a general space X in a heuristic manner is useful for elucidating its key ingredients and the ensuing challenges in developing a corresponding version on manifolds.The goal is to bound an integral (semi-)metric d H (µ, ν) := sup h∈H hdµ − hdν , between a probability measure ν and a target probability measure µ on X with respect to a class H of real-valued test functions on X .Stein's method is centred around the construction and study of an operator L that maps functions f : X → R in a certain class F into meanzero functions under µ: if X ∼ µ, then E [Lf (X)] = 0 for every f ∈ F .The operator L thus encodes information about µ and, when F is sufficiently large, one may determine a function f h ∈ F associated with every h ∈ H that solves the Stein equation (or the Poisson equation in PDE literature) h(x) − E [h(X)] = Lf h (x).As a consequence, bounding d H (µ, ν) reduces to bounding the term sup where Z ∼ ν, achieved in application-specific ways.An important implication, profitably used in some applications, is that the need to compute an expectation with respect to µ in d H is circumvented; an example is when ν is the empirical measure based on points x 1 , . . ., x n on X and µ represents a conjectured limit probability measure.Upper bounds on d H then depend explicitly on the smoothness of the functions in F .Hence, integral to the success of Stein's method in upper bounding d H are the following requirements: (a) construction of the operator L and identifying its domain F ; and, (b) determination of the solution f h and its regularity properties.
An introductory account on choices of L satisfying requirement (a) for various probability measures µ on R (or some subset thereof) is available in Ross [2011].When X = R m , m > 1, focus has mainly been restricted to the case when µ is a Gaussian measure (see Chatterjee and Meckes [2008], Barbour [1988], Meckes [2009]) although more recently results on extensions to non-Gaussian measures have appeared in Mackey and Gorham [2016], Mijoule et al. [2019], Fang et al. [2019], Chen et al. [2019].
An important observation by Barbour [1988] relates the operator L to the infinitesimal generator of a diffusion process on R m that solves an SDE with invariant measure µ.This observation enables identification, and examination, of the solution to the Stein equation with the transition semigroup associated with L. The diffusion approach hence opens up the possibility of defining L for µ on a manifold M by considering an SDE on M whose solution is a diffusion with invariant measure µ.
Broadly, this is the approach we adopt in this paper.On a complete Riemannian manifold (M , g) without boundary, we consider approximating probability measures of the form µ φ with density, up to a normalisation constant, e −φ with respect to the volume measure dvol for a smooth φ.Under some conditions on φ and the geometry of M , the diffusion with infinitesimal generator has µ φ as its invariant measure, where ∇ and ∆ are the (Riemannian) gradient and Laplace-Bertrami operators, respectively.The operator L φ generates mean-zero functions under µ φ .We address requirement (b) for generalising Stein's method to M by adapting the approach in Mackey and Gorham [2016] for log-concave measures µ on R m to the manifold setting.In their paper, bounds on lower-order derivatives of the solution f h , known as Stein factors (see Röllin [2012]), were derived by studying the distance between a pair of coupled diffusions X t and Y t with same invariant measure µ starting at distinct points.Analogously, we construct a pair of diffusions X t,x and Y t,y on M starting at x and y with identical generator L φ , and study the distance process ρ(X t,x , Y t,y ) around neighbourhoods of non-empty cut loci.In particular, when there is no first conjugate point contained in the cut locus to any given point in M we establish exponential pathwise contraction for trajectories of the two diffusions towards their initial points; on the other hand when first conjugate points are present, we establish a similar contraction property that holds on average.
The study of the distance process enables the determination of Stein factors which bound the Lipschitz constants of the solution f h , and its first and second derivatives, where the geometry of M manifests itself through curvature-dependent terms in the factors.The derived bounds on f h , as well as on its first and second derivatives, reduce to the ones of Mackey and Gorham [2016] for R m , which we show remain valid for complete, connected flat manifolds.The Stein factors are then used to construct upper bounds on integral (semi-)metrics between µ φ and another probability measure on M for specific choices of the class of test functions H.In particular, using the first order bound on f h , we derive an upper bound on the Wasserstein distance between µ φ and µ ψ .A related generalisation of Stein's method to manifolds, based on the approach of Fang et al. [2019], can be found in Thompson [2020].
The paper is organised as follows.In Section 2.1 we define relevant quantities and introduce notation, and in Section 2.2 we describe assumptions on probability measures and diffusions under consideration, and the key condition (3) and assumption (A1) for the derivation of our results; the conditions are explicated with some examples.In Section 3 we describe the coupling of a pair of diffusions on M and analyse their distance process, when conjugate points are absent (Section 3.1) and present (Section 3.2).In Section 4 we consider the Stein equation and its solution and derive Stein factor bounds that depend on the curvature of M .In Section 5, using the Stein factors, we derive bounds for integral (semi-)metrics: in Section 5.1 we derive an upper bound on the Wasserstein distance between µ φ and a probability measure of similar type, and in Section 5.2 we do the same for an integral semi-metric between µ φ and an arbitrary probability measure on M .

Preliminaries
2.1.Notation and definitions.We assume throughout that (M , g) is a complete and connected Riemannian manifold without boundary of dimension m and with covariant derivative D; by D i , i > 1 we then denote higher orders of D. We shall denote by ρ(x, y) the Riemannian distance between any two points x and y in M , and by dvol the Riemannian volume measure of (M , g).We denote by T x (M ) the tangent space to M at x ∈ M and by TM the tangent bundle of M .For k ≥ 1, C k (M ) denotes the class of k-times continuously differentiable real-valued functions on M , C(M ) denotes the set of continuous functions, and C 0 (M ) denotes continuous functions vanishing at infinity.The Lipschitz constant C 0 (h) of a Lipschitz function h ∈ C(M ) is defined as Higher-order Lipschitz constants of a function depend on bounding tensor fields.Accordingly, for each x ∈ M define the operator norm at x for a tensor field T on M , based on n-fold tangent vectors at x, as and call them the Lipschitz constants of D i h, where γ x,y denotes any possible minimal geodesic from y to x and Π γx,y denotes the parallel transport from T y (M ) to T x (M ) along γ x,y .Note that Dh = dh and that Hess h = D 2 h, where Hess h is the Hessian of h.Note also that sup with respect to a set of real-valued test functions H.

Key assumptions.
On M , we consider probability measures of the form e −φ dvol, with c(φ) = M e −φ dvol < ∞ and with support on the entire space M .We assume that φ ∈ C 2 (M ) is such that ∇φ is Lipschitz; specifically, we assume that Dφ has finite Lipschitz constant C 1 (φ).Throughout, X denotes a random variable with X ∼ µ φ .The uniformly elliptic operator L φ = 1/2 {∆ − ∇φ, ∇ } then is the infinitesimal generator of a Feller diffusion process that solves the Itô stochastic differential equation where Ric is the Ricci curvature tensor, then the corresponding semigroup P t = e tL φ is conservative (see Bakry [1986]), i.e., P t 1 ≡ 1 for all t > 0 or, equivalently, X t will, with probability one, not leave M in finite time.For a successful development of the Stein's method on M , we need the Bakry-Emery curvature criterion: there is a constant κ > 0 such that, (A1) : Ric(x) + Hess φ (x) 2κ g(x) ∀x ∈ M .
Remark 1.When M = R m , (A1) simplifies to v ⊤ Hess φ v 2κ for any unit (column) vector v in R m where, as usual, Hess φ is treated as an m × m matrix.Hence, (A1) reduces to the requirement in Mackey and Gorham [2016] that −φ is 2κ-strongly concave, noting that in their notation, φ here is − log p, up to a constant.This is also true if the Ricci curvature of M is always non-positive.In general, (A1) is weaker than the requirement that −φ is c-strongly concave for some c > 0.
Example 1.In order to elucidate condition (3) and assumption (A1) we look at a few example manifolds and probability measures µ φ .
(i) M is the standard sphere S m of dimension m.The function φ(x) corresponding to the von Mises-Fisher distribution M m (x 0 , c) takes the form φ(x) = −c cos(r(x)), with r(x) = ρ(x 0 , x) for c > 0 and a fixed point x 0 ∈ M .Since D 2 f (r) = f ′′ (r) dr × dr + f ′ (r)D 2 r on general manifolds and since D 2 r = cot(r){g − dr × dr} on S m (see Greene and Wu [1979]), it follows that and condition (3) holds for the von Mises-Fisher distribution with κ > max{−(m − 1) + c, 0}.However, if there is a κ > 0 such that assumption (A1) holds, then we must have 0 < c < m − 1.This requires in particular m > 1, and thus any von Mises-Fisher distribution on the circle fails to satisfy (A1).
(iii) M is the complex projective space CP m equipped with the Fubini-Study metric.This is also the Kendall shape space of configurations in R 2 with m + 1 labelled landmarks.
Let A be an (m + 1) × (m + 1) Hermitian matrix, i.e.A = A * and φ(z) = −z * Az, for z = x + iy ∈ C m+1 (column vectors) and |z| = 1, where A * denotes the complex conjugate transpose of A. Without loss of generality, we may assume that the smallest eigenvalue of A is zero.The corresponding µ φ is the complex Bingham distribution on CS m = S 2m+1 .Since φ(z) = φ(e iθ z), µ φ can be regarded as a distribution on M (see Kent [1994]).It can be shown that Hess φ (w, w) = 2{φ(z) − φ(w)} −2λ max for a horizontal (with respect to the projection from S 2m+1 to CP m ) unit vector w ∈ T z (S 2m+1 ), where λ max > 0 is the largest eigenvalue of A.
(iv) M is the rotation group SO(m) with the bi-invariant metric determined by g(E 1 , E 2 ) := − 1 2 tr(E 1 E 2 ) for skew-symmetric E 1 , E 2 , where m > 2. Assume that, for S ∈ M , φ(S) = −c tr(S 0 S) with S 0 ∈ SO(m) and a constant c > 0.Then, the corresponding µ φ is a von Mises-Fisher distribution on SO(m).It can be shown that Hess φ −c g.

The distance between coupled diffusions
Our approach to define the Stein equation on M and analyse properties of its solution rests on the construction of a pair of diffusions (X t , Y t ), and handling of the distance process ρ(X t , Y t ) between the pair.In particular, we prove exponential contraction of ρ(X t , Y t ) towards the initial points, and thus extend the approach used by Mackey and Gorham [2016] on R m to the manifold setting.In contrast to the Euclidean setting, since the distance function (x, y) → ρ(x, y) is not in C 2 (M × M ) if the cut locus of a point in M is not empty, analysis of the distance process ρ(X t , Y t ) requires additional care.
3.1.When no conjugate points are present in cut loci.We first consider the relatively simple situation where there is no conjugate point in the cut locus of any given point in M .In this setting, by modifying the arguments in Kendall [1986a] and Kendall [1986b], we are able to establish exponential pathwise contraction of distance between the diffusions, aided by a key result given in Lemma 3 in Appendix A of Supplementary Material, which expresses the distance function in terms of finitely many smooth functions in neighbourhoods of cut point, despite it not belonging to C 2 (M × M ).
Note first that, in terms of a Brownian motion B t on R m starting from the origin, the Itô differential equation (2) with initial condition where ds denotes the Stratonovich differential, H the horizontal lift from T M to the tangent bundle of the orthonormal frame bundle O(M ), where ξ 0 sits above x 0 .For an introduction to horizontal lifts and orthonormal frame bundles, see for example, Kobayashi and Nomizu [1963].
Theorem 1. Assume that M has the property that there is no conjugate point to any given point in M , and that the Bakry-Emery curvature criterion (A1) holds for a constant κ > 0.Then, for any x 0 , y 0 ∈ M , there is a pair of coupled diffusions (X t , Y t ) starting from (x 0 , y 0 ) such that both X t and Y t satisfy (2) and, for any ℓ 1, For any (x, v) ∈ T M , this map provides an intervening geodesic s → exp x (sv), 0 s 1, connecting x and exp x (v).The length of this geodesic is at least the distance between x and exp x (v).If the interior of this geodesic does not intersect the cut locus of x, then it is also a minimal geodesic between its two end points.Denote by Π(x,v) the parallel transport along this intervening geodesic from x to exp x (v) where, for our purpose, Π(x,v) is taken to be the identity map on T x (M ) if x = exp x (v) even though this may imply a discontinuity.
For any given (x 0 , y 0 Under the given assumptions, y 0 is not conjugate to x 0 .Then, if y 0 is a cut point of x 0 , a consequence of the proof of Lemma 3 in Appendix A of Supplementary Material is that there is a neighbourhood N of (x 0 , y 0 ) such that Exp −1 (N ) is a disjoint union of a finite number of open sets on TM and, restricted to each such set, Exp is a diffeomorphism from that set onto N .If y 0 is not a cut point of x 0 , then v 0 is uniquely determined by v 0 = exp −1 x0 (y 0 ) and a similar result holds with just one component in Exp −1 (N ).Hence, in particular, TM is locally a covering space of M × M .Within such a neighbourhood N of a given (x 0 , y 0 ), we can determine a continuous process (X t , V t ) ∈ TM starting from (x 0 , v 0 ) associated with (2), by solving the following coupled diffusions X t and where, similarly to Ξ and ξ 0 for X, Υ and η 0 are respectively a lift of Y to the orthonormal frame bundle O(M ) and η 0 sits above y 0 .Since B ′ t is also a Brownian motion on R m , both X t and Y t are diffusions satisfying (2) before they leave N .
When (X t , Y t ) hits the boundary of N , we can find a neighbourhood N ′ of (X t , Y t ) satisfying the above properties of N .Then, allowing V t to move discontinuously without altering (X t , Y t ) such that, after the jump, it satisfies (6), we can continue to run (X t , Y t ) within N ′ so defined.Note that, if X t0 = Y t0 for some t 0 0, then X t = Y t for t t 0 .
For (X t , Y t ) constructed as above, denote by ρ(X t , Y t ) the length of the intervening geodesic exp Xt (sV t ) between X t and Y t = exp Xt (V t ); and write γ t for the unit speed intervening geodesic from X t to Y t , that is, γ t (s) = exp Xt (sV t /|V t |).Note that ρ(X t , Y t ) depends implicitly on the choice of v 0 , which is not unique when y 0 is a cut point of x 0 .On the other hand, for any given v 0 which satisfies (6), ρ is a smooth function of (x, y) within the neighbourhood N chosen as above.However, the change of neighbourhood from N to N ′ usually results in a discontinuity for the process ρ(X t , Y t ).Nevertheless, ρ(X t , Y t ) is always continuous and where the latter becomes an equality immediately after the jump.Hence, to find an upper bound for ρ(X t , Y t ), it is sufficient to find an upper bound for ρ(X t , Y t ).
To bound ρ(X t , Y t ) we may assume, without loss of generality, that (X t , Y t ) lies in N for all t 0. Write u Denote by J i t the Jacobi vector field along γ t with J i t (0) = Ξ t u i and J i t (1) = Υ t v i .Then, since ρ is smooth under the assumption that (X t , Y t ) lies in a given neighbourhood of (x 0 , y 0 ), using the second-variation formula (see Cheeger and Ebin [1975]), a modification of the argument by Kendall [1986a] shows that the right hand side of ( 9) is given by where the integral is along γ t and R denotes the curvature tensor of M .
3.2.When conjugate points are present in cut loci.When conjugate points are present in cut loci in M , the construction of a pair of diffusions in the proof of Theorem 1 fails at such points.More precisely, if y 0 is a (first) conjugate point of x 0 along the geodesic exp x0 (sv), which also lies in the cut locus of x 0 , then D exp x0 (v) is singular.This means that it would be impossible to find a neighbourhood N of (x 0 , y 0 ) that has the properties described above following (6).In particular, it would be impossible to find a subset of TM , as specified there, such that Exp is a diffeomorphism from that subset onto N .It is evident from the proof of Theorem 1 that the existence of such a diffeomorphism offers a way to couple (X x,t , Y y,t ) at, and beyond, cut points.Nevertheless, we now show that it is still possible to construct a pair of diffusions on M with properties that (i) they both satisfy (2) and (ii) the expected distance between them contracts at least exponentially.This relies on a generalisation of the technique used in Theorem 5 of Kendall [1986b] to deal with the presence of conjugate points.In the nonconjugate part of the cut locus of M analysis proceeds as with Theorem 1.To warn us of when the diffusions get close to the first conjugate locus, we use the operator L φ , and monitor the value of its action on the distance function ρ; this value decays towards −∞ when the points approach the first conjugate locus.Effectively, we determine a neighbourhood N 2δ ⊂ M × M of the first conjugate locus in M × M for a constant δ that depends on κ and the injectivity radius of M .Once the coupled diffusions enter N2δ , the closure of N 2δ , we decouple them, run independent diffusions until they hit M \N δ , where N δ ⊃ N 2δ , and then return to coupling again.
We first need two preliminary results before stating and proving the main result in this section.Observe that the set Then, the construction (7) of (X t , Y t ) can be applied to the case when the starting point (x 0 , y 0 ) is in E and it remains valid until the first exit of (X t , V t ) from Ẽ.We now modify the construction by Kendall [1986b]: combine the coupled diffusions (X t , Y t ) defined by (7), while the corresponding (X t , V t ) is not too close to the boundary of Ẽ, with X t , Y t evolving independently.
For this, we first need a result on the distance function of two independent diffusions on M specified by (2).Lemma 3 in Appendix A of the Supplementary Material ensures the following property of ρ(x, y) on neighbourhoods of the cut locus (ii) for any (x, y) ∈ C \ C 0 , there is a neighbourhood N of (x, y) in M × M and two smooth functions ̺ 1 and ̺ 2 on N such that Since the (first)-conjugate part of C has co-dimension 2 in M × M (see Barden and Le [1997]), the result of that Lemma also implies that C 0 can be chosen to have co-dimension 2. Also, similarly to the argument at the beginning of the proof of Theorem 1, N in (ii) above can be chosen such that Exp −1 (N ) is a disjoint union of two open sets V 1 , V 2 in TM and, restricted to each V i , Exp is a diffeomorphism from that set to N .Then, the smooth function ̺ i (x ′ , y ′ ) constructed in the proof of Lemma 3 in Appendix A of the Supplementary Material is in fact the length of the geodesic from x ′ and y ′ , the initial tangent vector v i to which lies in V i .That is, using our notation for the length of intervening geodesics, we have . This leads to the following generalisation of Theorem 5 of Kendall [1986b] and of Theorem 3 of Barden and Le [1997].The proof of this generalisation is a slight modification of the proof for Theorem 3 of Barden and Le [1997] (see also Le and Barden [1995] for more detailed derivations), and we hence omit it here.
Lemma 1. Suppose that X t and Y t are independent diffusions on M , both satisfying (2).
Then, the distance ρ(X t , Y t ) is a semimartingale and, before the first time that where B t is a Brownian motion on R; L is a non-decreasing process that is locally constant outside C; and, for fixed x 0 and x = x 0 , and L φ,2 ρ is similarly defined with respect to the second argument of ρ, and where the operator L φ is defined by L φ = 1 2 {∆ − ∇φ, ∇ }.To detect that the coupled (X t , Y t ), constructed by ( 7), is close to the boundary of E and to control the independent diffusions X t and Y t , we need the following generalisation of a geometric description (see Kendall [1986b]), wherein we replace the Laplacian operator considered there with L φ , and replace the lower bound constant c determining the set O c (which was denoted by U c by Kendall [1986b]) by cρ(x, y).Since φ is in C 2 (M ), the proof for our result is analogous to that for the lemma in Kendall [1986b], and we omit it here.
We are now ready to prove the following result for Riemannian manifolds M with nonempty conjugate locus (e.g., spheres), which is weaker than Theorem 1 in that the exponential contraction between the diffusions towards their initial points is in expection and not pathwise.
Theorem 2. Assume that the Bakry-Emery curvature criterion (A1) holds for a constant κ > 0.Then, for any ℓ 1 and for any x 0 , y 0 ∈ M , there is a pair of diffusions (X t , Y t ) starting from (x 0 , y 0 ) such that both X t and Y t satisfy (2) and Note that, unlike the result of Theorem 1, the (X t , Y t ) constructed here will depend on ℓ.
We now construct diffusions X t and Y t , both satisfying (2), as follows.For given (x 0 , y 0 ) ∈ M × M , if there is a minimal geodesic between them which contains no conjugate point, we construct diffusions X t and Y t by solving (7) beginning at (x 0 , y 0 ).By allowing the corresponding (X t , V t ) to jump if necessary, as commented following the construction (7), we continue such a construction for (X t , Y t ) until the first time that (X t , V t ) leaves O 2δn .Suppose that (X t , V t ) leaves O 2δn at time τ .We then consider all minimal geodesics between X τ and Y τ containing no conjugate point and, if possible, choose one for which the corresponding (X τ , V τ ) lies in Ōδn .We then repeat the construction as before with the chosen new starting point.This iterated construction continues until the choice of such (X τ , V τ ) in Ōδn is no longer possible.
If it is not possible initially to choose a minimal geodesic containing no conjugate point, or if at some stage a choice of the above (X τ , V τ ) in Ōδn is impossible, then we continue the construction of X t and Y t by evolving them independently until (X t , Y t ) hits Ōδn .
To show that the required result holds for (X t , Y t ) constructed in such a way, it is sufficient by Theorem 1 to restrict to the case when X t and Y t evolve independently.Then, (X t , Y t ) is not in Ōδn .Recalling that a co-dimension 2 set in M × M is a polar set of a non-degenerate diffusion on M × M it follows from Lemmas 1 and 2 and from the choice of where M t is a martingale.Hence, we have E ρ(X t , Y t ) ℓ ρ(x 0 , y 0 ) ℓ e −ℓκt as required.
Remark 2. In the literature, there are several ways to construct couplings for proving the existence of contractivity.For example, in the curvature setting, the framework of weighted Riemannian manifolds is now part of a broader one for CD-spaces (see e.g., Sturm [2006a,b]).In this context, the existence of contractive couplings was treated by Kuwada [2010], von Renesse and Sturm [2005].In particular, the Kuwada duality theorem (see Kuwada [2010], Theorem 2.2), in conjunction with the implication of contractivity of the heat flow under Curvature-Dimension condition, implies the existence of a contractive coupling such as in the proof of Corollary 1 in von Renesse and Sturm [2005].The coupling we construct here, in addition to proving the required contractivity, will also be employed in the Supplementary Material to study certain stochastic vector fields along the paths X x,t and Y y,t , which play important roles in obtaining the Stein factors.

Solution to the Stein equation and Stein factors
We are now ready to turn our attention to the Stein equation where h belongs to a suitable class of real-valued test functions on M .Using the distance process ρ(X x,t , Y y,t ) for a pair of diffusions (X x,t , Y y,t ) constructed above, in this Section we determine the solution f h to the Stein equation ( 16) and examine its properties.

The solution f
Proposition 1.Let M be a complete and connected Riemannian manifold.Assume that the Bakry-Emery curvature criterion (A1) holds for a constant κ > 0 and that X is a random variable on M with distribution µ φ such that E [ρ(X, x)] < ∞ for some x ∈ M .For every h ∈ H 0 the function ).Thus, Proposition 1(ii) recovers the corresponding result in Mackey and Gorham [2016], as the constant 2κ here corresponds to constant k there.Moreover, the result of Proposition 1(ii) is equivalent to that of Proposition 6.1 in Thompson [2020].
Proof.Let (X x,t , Y y,t ) be the pair of diffusions in Theorem 2 with ℓ = 1, starting from (x, y).
Then, both X x,t and Y y,t satisfy (2).Since µ φ is the invariant measure for Y t , using the Lipschitz property of h and Theorem 2, This proves that f h is well-defined.Now, for any x, y ∈ M , The next result shows that the function f h defined by (18) solves the Stein equation for the probability measure µ φ .
Theorem 3. Assume that M is a complete and connected Riemannian manifold and that Bakry-Emery curvature criterion (A1) holds for a constant κ > 0. Let X be a random variable on M with distribution µ φ such that E [ρ(X, x)] < ∞ for some x ∈ M .For h ∈ H 0 , the function f h in (18) solves the Stein equation (16).
Remark 4. When M = R m this result recovers the result by Mackey and Gorham [2016]; in particular, E [L φ f h (X)] = 0. On the other hand, the Bakry-Emery curvature criterion (A1) implies certain restrictions on the probability measures to which we can apply Theorem 3.For example, as noted in Example 1(i), one cannot apply it to von Mises-Fisher distributions on the circle.In this case, using direct integration by parts, for probability measures µ φ with X ∼ µ φ on S 1 , the function Lewis [2021]).
Proof.Let X x,t be a diffusion starting from x and satisfying (2).Since the corresponding semigroup {P t | t 0} is strongly continuous on C 0 (M ) and L φ is the infinitesimal generator of X x,t , we have [Ethier and Kurtz, 1986, Prop. 1.5].However, for h(x) = h(x) + a where Then, by taking a = E [h(X)] and noting L φ (a) = 0, we can also write the above as Now, take (X x,t , Y y,t ) to be the pair of diffusions, starting from (x, y), as Theorem 2 with ℓ = 1.Since Y t satisfies (2), the fact that µ φ is the invariant measure of Y t gives that where the last inequality follows from Theorem 2 and where C 0 (h) is the Lipschitz constant for h.Thus, lim t→∞ On the other hand, the result of Theorem 2 implies that we may apply the Dominated Convergence Theorem to obtain that, as t → ∞, the right hand side of ( 19) tends to as required.

Stein factors.
In the literature, Stein factors refer to bounds on solutions f h of the Stein equation ( 16).A direct consequence of Proposition 1 and Theorem 3 is that f h defined by ( 18) is differentiable and Df h is bounded.
Proposition 2. Under the conditions of Theorem 3, Df h exists and where f h is defined by (18).
We will see later in Section 5.1 that the bound on Df h given above suffices to bound the Wasserstein distance between the probability measure µ φ and another µ ψ ∝ e −ψ .However, for bounding more general integral (semi-)metrics, bounds on first-and second-order derivatives of f h , known as Stein factors, are needed.
Accordingly, denote by Ric ♯ φ the tensor equivalent to Ric + Hess φ in the sense that, for any x ∈ M , and for any u, u Recall that (see O'Neill [1983]) (21) and that, in terms of a (local) frame field e where R denotes the Riemannian curvature tensor.Thus, it is possible to express Ric ♯ φ explicitly in terms of the frame field as We can define the Lipschitz constant for Ric ♯ φ in a similar way to the definition of the Lipschitz constant given in (1).Let ( 23) Proposition 3. Assume that the conditions of Theorem 3 hold.Assume further that Ric ♯ φ is Lipschitz with finite Lipschitz constant L(Ric ♯ φ ).For every h ∈ H 1 with f h defined in (18), Df h is Lipschitz with constant . Thus, Proposition 3 recovers the corresponding result in Mackey and Gorham [2016].On the other hand, the result of Proposition 3 differs from the corresponding Proposition 6.2 in Thompson [2020]: in theirs, the relationship between the constant c 1 obtained and those given in the assumptions is not specified; using our notation, the upper bound for C 1 (f h ) there would depend only on C 0 (h) while ours depends on both C 0 (h) and C 1 (h).
Proof.The proof uses Lemmas 4 and 5 given in Appendix B of Supplementary Material.For any x ∈ M and v ∈ T x (M ), consider the vector field v x t along the path X x,t which solves the differential equation with v x 0 = v, where X x,t is the solution to (2).It is known that, for any fixed t > 0 and under the given condition for h, N s = D E h(X Xx,s,t−s ) (v x s ) is a local martingale for 0 s t (see Thalmaier [1997]).Since Thompson [2020, Theorem 11.2], where the Z there corresponds to −2∇φ here.)Thus, from the definition of f h , the Dominated Convergence Theorem and Theorem 2, it follows that, for any v ∈ T x (M ), Now, consider the pair of diffusions (X x,t , Y y,t ), starting from (x, y), in Theorem 2 with ℓ = 2. First, by applying the Hölder inequality, Theorem 2 and Lemma 4 (Appendix B of Supplementary Material), we have that Moreover, writing v y t for the solution of (33) with the underlying path X x,t replaced by Y y,t and with the initial condition v y 0 = Π γx,y (v), and denoting Π γ X x,t ,Y y,t (v x t ) by ṽx t , we also have where the second inequality follows from Lemma 5 (Appendix B of Supplementary Material) with q = 1.
The argument in Remark 5 regarding the case when M = R m can be extended to the case when M has constant Ricci curvature, which implies that the bounds in Mackey and Gorham [2016] continue to hold for such M .This gives the following corollary.
Corollary 1. Assume that the conditions of Theorem 3 hold.Assume further that M is Ric flat and φ has finite Lipschitz constant C 2 (φ).Then, for every h ∈ H 1 and f h as defined in (18), Df h is Lipschitz with constant The curvature of the manifold plays a more explicit role in the Lipschitz constant for D 2 f h .To see this, define the tensor d ⋆ R by Noting that R(∇φ)(u, v) = R(∇φ, u)v, to simplify notation, we also define The bound on Df h requires restriction to the smaller and smoother class H 1 ; the same is required when bounding D 2 f h .Let (29) Proposition 4. Assume that the conditions of Theorem 3 hold and that are both finite, where R ♯ φ is defined by (35).Further, assume that Ric ♯ φ , R ♯ φ and R are all Lipschitz with finite Lipschitz constants L(Ric ♯ φ ), L(R ♯ φ ) and L(R) respectively.For every h ∈ H 2 with f h defined in (18): (ii) If χ 2 > 0 and κ > 1/2, then D 2 f h exists and is Lipschitz with constant Remark 6.Note that, χ 2 = 0 corresponds to M being a flat manifold, such as a Euclidean space, a cylinder or a flat torus.Consequently, Our result thus recovers the corresponding bound given in Mackey and Gorham [2016] for R m , where L i , M i (h) and k in Mackey and Gorham [2016] correspond respectively to C i−1 (φ), C i−1 (h) and 2κ here.Our result establishes that their upper bound also holds for general complete and connected flat manifolds.
On the other hand, if M is locally symmetric, we have DR = 0.Then, it follows from (21) and ( 22 . As symmetric manifolds are locally symmetric, this will hold for a class of familiar manifolds, such as spheres, hyperbolic spaces, projective spaces and the space of positive definite symmetric matrices.Pertinently, the upper bound for C 2 (f h ) in Proposition 4 when χ 2 = 0 is not the limit, as χ 2 → 0, of that for χ 2 > 0. In addition, we need an extra requirement for κ when χ 2 > 0.
Proof.The proof uses Lemmas 4, 5, 6 and 7 given in Appendix B of Supplementary Material.Consider the vector field V x t along the path X x,t which satisfies the stochastic covariant Itô equation with V x 0 = 0, where Ξ is defined in (4), R ♯ φ and Ric ♯ φ are defined by ( 35) and ( 20) respectively, and where u x t and v x t are the solutions of (33) both with the underlying path X x,t and with the initial conditions u x 0 = u and v x 0 = v respectively.It is known that, for h satisfying the given conditions, ) is a local martingale for 0 s t, for any fixed t > 0 [Thompson, 2020, Lemma 11.3].Since , which implies that, for any fixed t > 0 and u, v ∈ T x (M ), Then, the definition of f h , the Dominated Convergence Theorem and Theorem 2 together ensure that D 2 f h exists and that, for any u, v ∈ T x (M ), Now, we construct a pair of diffusions (X x,t , Y y,t ), starting from (x, y), as in Theorem 2. Since we need to apply Lemmas 5 and 7 (Appendix B of Supplementary Material) to the processes related to (X x,t , Y y,t ) in the following proof, it is necessary to take the parameter ℓ in the construction of (X x,t , Y y,t ) to be 6.As in the proof of Proposition 3, write u y t and v y t for the solutions of (33) with the underlying path X x,t replaced by Y y,t and with the respective initial conditions u y 0 = Π γx,y (u) and v y 0 = Π γx,y (v).Also, let ũx t denote Π γ X x,t ,Y y,t (u x t ), and similarly for ṽy t and Ṽ y t .Then, Under the given conditions on h, the first term on the right hand side of (32) can be estimated as Similarly, for the second term on the right hand side of (32), we have that By the Hölder inequality, Theorem 2 and Lemmas 4, 5, 6 and 7 (Appendix B of Supplementary Material), it follows from the above estimations and from (32) that, if 2κ − 1 when κ > 1/2, as required.
If χ 2 = 0, we need to modify the above application of Lemmas 6 and 7 (Appendix B of Supplementary Material).This results in 1 ρ(x, y) |u| |v| This shows that Ddf h is Lipschitz with the required constant.

Application to bounding integral (semi-)metrics
A key application of Stein's method is in obtaining upper bounds on an integral (semi-)metric d H (X, Z), with respect to some function class H, for an arbitrary random variable Z ∼ ν.Exploiting the characterising property of the operator L φ , ∀h ∈ H, the task then reduces to obtaining a uniform upper bound on E [L φ f h (Z)] over functions f h using the Stein factors.The quantity d H is clearly a semi-metric and is a metric only if H separates points in the set of signed measures on M .
5.1.Wasserstein distance between µ φ and µ ψ .The result of Theorem 3 in conjunction with the first-order bound in Proposition 2 can be used to obtain an upper bound on the 1-Wasserstein distance between certain types of random variables.For this we consider the function class , under which d H is a bonafide metric.The 1-Wasserstein distance between two random variables Z 1 and Z 2 on M is then defined as Theorem 4. Assume that the conditions of Theorem 3 hold.Let Z ∼ µ ψ such that E [ρ(Z, x)] < ∞ for some x ∈ M , where ψ satisfies (3) with some constant κ ′ > 0. Then Proof.The proof pursues a similar argument to that of Proposition 4.1 of Mijoule et al. [2019].Note first that sup For h ∈ H 1 ≤1 ∩ C 0 (M ), we have by Theorem 3 that On the other hand, the given assumption that Z ∼ µ ψ , where ψ satisfies (3), also implies that E so that the result follows from Proposition 2.
(i) The functions φ and ψ corresponding to von Mises-Fisher distributions M (x 1 , c 1 ) and M (x 2 , c 2 ) are respectively −c 1 cos ρ(x 1 , x) and −c 2 cos ρ(x 2 , x).Then, From the symmetry between φ and ψ, it follows that the Wasserstein-1 distance d W between M (x 1 , c 1 ) and M (x 2 , c 2 ) is bounded: where X i ∼ M (c i , x i ).
(ii) The function ψ corresponding to the Fisher-Watson distribution Hence, for X ∼ M (x 1 , c 1 ) and If c ∈ (0, (m − 2)/2), there is a κ > 0 such that the Bakry-Emery curvature criterion (A1) holds, as seen in Example 1(iv).Then, if Z is a uniform random variable on SO(m), S 0 Z is also a uniform random variable and so where and where C i (f h ) are bounded as in Propositions 3 and 4.
as required.
A further simplification occurs when M is compact.
Corollary 3. If M is compact then, for any Lipschitz function on M with C 0 (h) 1, any fixed ǫ > 0 and s > 0, there exists a g ∈ C 2 (M ) with Lipschitz constants C i (g), i = 0, 1, 2, such that C 0 (g) 1 + s and Proof.Since M is compact, any g ∈ C ∞ (M ) has bounded derivatives, and thus possesses finite Lipschitz constant C i (g), i = 0, 1, 2, . . ., k for every k.This ensures that Lipschitz constants C i (f g ), i = 0, 1, 2 of the Stein equation solution f g are finite.
The existence of the requisite g ∈ C 2 (M ) is guaranteed by the result in Azagra et al. [2007] on existence of a C ∞ Lipschitz approximation of a Lipschitz function.By Theorem 1 in Azagra et al. [2007], for every Lipschitz function h on M with Lipschitz constant 1 and for every ǫ, s > 0, there exists a g ∈ C ∞ (M ) such that sup x∈M |g(x) − h(x)| < ǫ with C 0 (g) 1 + s.Thus, by applying Corollary 2 to g, we have as required.

Consider the function class
from Propositions 1, 3 and 4, as well as Corollary 2, the following result on the bound for the integral (semi-)metric Theorem 5. Assume that the conditions of Proposition 4 hold, and that φ is Lipschitz with Lipschitz constants C i (φ), i = 0, 1.Then, for any random variable Z on M , where, if χ 2 = 0, and where the constants χ 1 , χ 2 and β are as in Proposition 4. (ii) For q > 1, we use similar arguments to those in (i) above.First, applying the Hölder inequality, Theorem 2 and Lemma 4, we obtain from (36) that 1−1/q E |v x t | 2q 1/(2q) E ρ(X x,t , Y y,t ) 2q 1/(2q) dt 1−1/q e −2κt dt.