Long-time behaviour of degenerate diffusions: UFG-type SDEs and time-inhomogeneous hypoelliptic processes

We study the long time behaviour of a large class of diffusion processes on R^N, generated by second order differential operators of (possibly) degenerate type. The operators that we consider need not satisfy the Hormander condition. Instead, they satisfy the so-called UFG condition, introduced by Herman, Lobry and Sussman in the context of geometric control theory and later by Kusuoka and Stroock, this time with probabilistic motivations. In this paper we will demonstrate the importance of UFG diffusions in several respects: roughly speaking i) we show that UFG processes constitute a family of SDEs which exhibit multiple invariant measures and for which one is able to describe a systematic procedure to determine the basin of attraction of each invariant measure (equilibrium state). ii)We show that our results and techniques, which we devised for UFG processes, can be applied to the study of the long-time behaviour of non-autonomous hypoelliptic SDEs. iii) We prove that there exists a change of coordinates such that every UFG diffusion can be, at least locally, represented as a system consisting of an SDE coupled with an ODE, where the ODE evolves independently of the SDE part of the dynamics. iv) As a result, UFG diffusions are inherently less smooth"than hypoelliptic SDEs; more precisely, we prove that UFG processes do not admit a density with respect to Lebesgue measure on the entire space, but only on suitable time-evolving submanifolds, which we describe.

1. Introduction 1.1. Context and scope of the paper. Consider stochastic differential equations (SDEs) in R N of the form where V 0 , . . . , V d are smooth vector fields on R N , • denotes Stratonovich integration and B 1 (t), . . ., B d (t) are one dimensional independent standard Brownian motions. The Markov semigroup {P t } t≥0 associated with the SDE (1) is defined on the set C b (R N ) of continuous and bounded functions as We recall that, given a vector field V : R N → R N , we can interpret V both as a vector-valued function on R N and as a first order differential operator on R N : Date: May 4, 2018.
With this notation, the Kolmogorov operator associated with the semigroup P t is the second order differential operator given on smooth functions by The Markov diffusion X t is called hypoelliptic (elliptic, respectively) when the operator L is hypoelliptic (elliptic, respectively) [3]. The study of diffusion processes of hypoelliptic type has by now produced a fully-fledged theory, involving several branches of mathematics: stochastic analysis, analysis of differential operators, (sub-)Riemannian geometry and control theory. One of the key steps in the development of such a theory has been the seminal paper of Hörmander [25] and a large body of work has been dedicated for over forty years to the study of diffusion processes under the Hörmander Condition (HC) (in one if its many forms), which is a sufficient condition for hypoellipticity. In particular, the ergodic theory for hypoelliptic SDEs is well developed, see [17,21,48,49] and references therein -throughout the paper we define a process to be ergodic if it admits a unique invariant measure (stationary state).
To the best of our knowledge, this is the first paper that attempts to build a framework for the study of the long time asymptotics of solutions of SDEs which are non-necessarily hypoelliptic. We will work in the setting in which the vector fields V 0 , . . . , V d satisfy a weaker condition, the so-called UFG condition. The acronym UFG stands for Uniformly Finitely Generated. We give a precise statement of the UFG condition in Definition 3.1. For the moment let us just point out that, while the Parabolic Hörmander condition imposes the following where as customary the hierarchy of operators L j is defined as L 1 (x) := {V 1 (x), . . ., V d (x)} and, for j > 1, L j (x) = L j−1 (x) ∪ {[V, V k ], V ∈ L j−1 , k ∈ {0, . . ., d}}, under the UFG condition the vector space appearing in (PHC) is not required to have constant rank; roughly speaking, it is only required to be finitely generated. In particular, we emphasize that the UFG condition does not impose the vector space in (PHC) to be homeomorphic to R N for any x ∈ R N . Hence, in this sense, the UFG condition is weaker than parabolic the Hörmander condition. The UFG condition has been long known by the (geometric) control theory community, although perhaps under other names (see Section 3 for a more detailed account on the matter), and it is indeed well-studied in the works of Hermann, Lobry and Sussman [24,37,55]. It was then considered by Stroock and Kusuoka in the eighties [31][32][33][34], though in a completely different context (which we briefly explain below). The purpose there was to study smoothing properties of the semigroup P t under the UFG condition. In this paper we combine the geometric viewpoint with the functional analytic and probabilistic one to introduce new results on the asymptotic behaviour of UFG diffusions. In broad terms, the two main achievements of this paper can be described as follows: i) We study the diffusion process (1) in absence of the Hörmander condition. To this end, we establish explicit connections between the geometric theory of finitely generated Lie algebras and the related stochastic dynamics. Because every (uniformly) hypoelliptic process is a UFG process, our results cover a very large class of SDEs. In particular we show that our approach can be fruitfully employed to study the asymptotic behaviour of non-autonomous hypoelliptic diffusions.
ii) We argue that UFG processes constitute a class of SDEs which exhibit, in general, multiple equilibria and for which one is able, given an initial datum, to determine the invariant measure to which the dynamics will converge.
Let us further remark on the significance of the latter point: although a large body of work has been devoted to the study of ergodic processes, the development of a general framework to understand problems with multiple equilibria is at a very early stage. It is well known that ergodic processes will, under appropriate general conditions, converge to their unique equilibrium irrespective of the initial configuration, i.e. they will tend to lose memory of the initial datum. Clearly this cannot be the case, in general, for more complicated systems. When the invariant measure is not unique it is typically extremely difficult to determine the basin of attraction of each equilibrium measure and we are indeed not aware of any criteria developed to this effect. To be more precise, one can ask one of the two (complemetary) questions: given an initial datum for the SDE, which equilibrium measure will the process converge to? Conversely, given an equilibrium measure µ, one may wish to describe the basin of attraction of such a measure, i.e. the set of initial data x ∈ R N such that the process X (x) t 1 converges to µ. Beyond numerical simulations, no theoretical framework currently exists to tackle this kind of problems.
In this paper we introduce a systematic way to study long-time convergence for a large class of SDEs which will, in general admit several stationary states. This methodology applies to UFG diffusions and hence, because processes that satisfy the (uniform) parabolic Hörmander condition are UFG processes, our results produce further understanding on non-ergodic Hörmander processes -we stress here, and we will emphasize it again in Section 4, that hypoelliptic processes need not be ergodic (see Section 4 for examples of hypoelliptic processes which are not ergodic).
The Markov diffusions studied here are linear, in the sense that their generators (4) are linear second order differential operators. As a point of comparison, another class of systems exhibiting multiple equilibria is the class of so-called collective dynamics: in this case the system is constituted by a large number of particles or agents that interact with each other. The underlying kinetic-PDEs for this type of models are non-linear in the sense of McKean and the existence of multiple stationary states here is due to such a nonlinearity. In our case, the nature of the phenomenon is completely different and in a way simpler, as multiple invariant measures arise as a result of the non-trivial control-theory implied by the UFG condition.
In the remainder of this introduction we comment on the implications and significance of the UFG condition first from an analytic perspective and then from a geometric and probabilistic viewpoint. In Subsection 1.2 we explain the main results of the paper and the reasons for studying UFG diffusions; we then conclude the introduction with Subsection 1.3, where we illustrate the organization of the paper.
As is well known, under the (parabolic) Hörmander condition, the transition probabilities of the semigroup P t have a smooth density; furthermore, P t f is differentiable in every direction and u(t, x) := (P t f )(x) is a classical solution of the Cauchy problem ∂ t u(t, x) = Lu(t, x) u(0, x) = f (x).
In the present paper we will relax the hypoellipticity assumption and study the long-time behaviour of the dynamics (1) in absence of the Hörmander condition.
In a series of papers [8,10,13,[31][32][33][34], Kusuoka and Stroock first and Crisan and collaborators later, have analyzed the smoothness properties of diffusion semigroups {P t } t≥0 associated with the stochastic dynamics (1) when the vector fields {V i , i = 0, 1, ..., d} satisfy the UFG condition. Such works showed that, as opposed to what happens under the PHC, under the UFG condition the semigroup P t is no longer differentiable in every direction; in particular it is no longer differentiable in the direction V 0 , but it is still differentiable in the direction V := ∂ t − V 0 when viewed as a function (t, x) → u(t, x) over the product space (0, ∞) × R N . This fact has been proved by means of Malliavin calculus and in this paper we give a geometric and analytic explanation of such a phenomenon. Because of differentiability in the direction V, a rigourous PDE analysis can still be built starting from the stochastic dynamics (1). In this case one can indeed prove that for every f ∈ C b (continuous and bounded), the function u(t, x) := (P t f )(x) is a classical solution of the 1 We use the notation X (x) t to emphasize the fact that the initial datum of the process is X0 = x.

Cauchy problem
From a geometrical and control-theoretical point of view, working with the UFG condition will imply dealing with distributions of non-constant rank. If the geometric understanding of the Hörmander condition is rooted in the classic Frobenius Theorem, which deals with distributions of constant rank, the geometry of the UFG condition is depicted in the works of Hermann, Lobry and Sussman [24,37,55]. In these works, the UFG condition was considered for geometric and control theoretical purposes, in particular for the study of reachability (i.e., roughly speaking, to answer questions regarding the set of points that can be reached by the integral curves of given vector fields). In this respect we should stress that the UFG condition is not optimal from a control-theoretical point of view (an optimal condition for reachability has been described by Sussmann [55]). However, it is the closest to being optimal, while still being easy to check in practice.
Finally, by a probabilistic standpoint, it is well known that the Hörmander condition is a sufficient (and almost necessary) condition for the law of the process (1) to have a density, see [22], and this fact has motivated the large literature on hypoelliptic SDEs. Again, the understanding of this matter relies on Frobenius Theorem, as Hörmander himself noted [25]. In this paper we show that while, as expected in absence of the Hörmander condition, the SDE (1) does not admit a density on R N , it admits a density on an appropriate time-dependent submanifolds of R N , which we explicitly describe in Section 8.

Main Results.
The main results of this paper are the following: Proposition 4.7, Proposition 5.1 and Proposition 5.7 give a description of the global behaviour of the dynamics (1), under the sole assumption that the vector fields V 0 , . . ., V d satisfy the UFG condition; Theorem 6.6 and Theorem 7.8 describe the long time behaviour of non-autonomous hypoelliptic processes and of UFG processes, respectively, identifying invariant measures and characterizing their basin of attraction; finally in Theorem 8.8 we describe appropriate manifolds where the process X t admits a density. Let us give a rather informal description of such results. Precise notation, assumptions and statements are deferred to the relevant sections.
A distribution ∆ on R N is a map that, to each point x ∈ R N , associates a linear subspace of the tangent space T x R N . Given a set D of smooth vector fields on R N , the distribution generated by D, denoted by ∆ D , is the map x → span{X(x) : X ∈ D}. Let us introduce two distributions, ∆(x) and∆ 0 (x), that will play a fundamental role in this paper. To avoid having to set too much notation and nomenclature, we introduce them now informally but we will give precise definitions at the beginning of Section 4. 3 The distribution∆ is generated by the vector fields contained in the Lie algebra (PHC), i.e. the distribution while∆ 0 (x) = span{Lie{V 0 (x), V 1 (x), , . . ., V d (x)}} = span{V 0 (x) ∪∆(x)} .
Clearly,∆(x) ⊆∆ 0 (x) for every x ∈ R N and the two distributions coincide at x if and only if V 0 (x) is a combination of the vectors contained in∆. More precisely, we decompose the vector V 0 into a component which belongs to∆, V (∆) 0 , and a component which is orthogonal to∆, V (⊥) 0 : In other words, V (⊥) 0 (x) is the orthogonal projection of V 0 (x) on the vector space∆(x), so the two distributions coincide if and only if V (⊥) 0 = 0. We will see that the vector V (⊥) 0 plays an important role for the dynamics and, ultimately, it is the component of V 0 responsible for the lack of smoothness in the direction V 0 . Therefore, in a way, the distribution∆ is the one containing all the directions along which the problem (5) is smooth. We will come back to this later.
Under the UFG condition the integral manifolds (see Section 3.2 for definition) of∆ 0 form a partition of the state space R N . Let S be one such manifolds. 4 If X 0 = x ∈ S then X (x) t ∈ S for all t ≥ 0. That is, if the process starts from one of the manifolds of the partition, then it remains in the closure of such a manifold; but, crucially, it can hit the (topological) boundary ∂S := S \ S of the manifold S . This is the content of Proposition 4.7. Such a statement is obtained by combining the known geometric theory of distributions with non-constant rank with the classical Stroock-Varadhan support theorem. We further prove that if X t hits the boundary ∂S of the manifold S , then it never leaves it, see Proposition 5.1 and Note 5.2. Therefore: i) because the dimension of the boundary ∂S is smaller than the dimension of S , along the path of X (x) t the rank of the distribution cannot increase; ii) if the solution of the SDE leaves the manifold S from where it started, then any invariant measure can only be supported on the boundary ∂S of such a manifold, see Proposition 5.7.
Further understanding of the dynamics relies on the results of Section 4.2: in this section we show that, after an appropriate change of coordinates, any N -dimensional SDE of UFG-type can be written, at least locally, as a system of the form whereζ t is an ordinary differential equation (ODE),ζ t ∈ R N −n , 5 Z t ∈ R n ,Ŵ 0 : R N −n → R N −n and U i : R N → R n for every i ∈ {0, . . ., d}. Beyond details about the dimensionality of the ODE component, the important thing is that the solution of the ODEζ t evolves independently of the SDE part, while the coefficients of the SDE depend on the evolution of the ODE. We will informally refer to such a representation as being of the form "ODE+SDE". In general, this representation is only local. This change of coordinates has been known for a long time in differential geometry; here we are simply expressing it in a way which is more congenial to our setting and purposes and we apply it to SDEs. While the change of coordinates itself is not new, to the best of our knowledge it has never been exploited in SDE theory. This local representation is both an important technical tool throughout the paper and a fundamental element in understanding the evolution of the dynamics. Referring to the PDE (5), we also note here that the change of coordinates gives a geometric interpretation of the (potential) loss of smoothness in the direction V 0 and of the reason why smoothness is instead maintained the direction V, see Note 4.12 on this point. In view of the discussed change of coordinates, it makes sense to start by studying UFG dynamics for which the representation (10)-(11) is global. For this reason in Section 6 we consider systems which are (globally) of the form (10)- (11), where the ODE is assumed to be one-dimensional and 4 By definition of integral manifold, on each one of these manifolds the rank of the distribution∆0 is constant and it is equal to the dimension of the manifold itself. 5 To make a link with the more precise notation that we will use in Section 4.2, we are denoting here byζt the components (ζt, at) in (29)-(30), i.e.ζt = (ζt, at).
the SDE satisfies a form of Hörmander condition. More precisely, the dynamics studied in Section 6 are non-autonomous hypoelliptic SDEs; because the topic is somewhat of independent interest, this section has been written in such a way that it can be read independently of the rest of the paper. Non-autonomous SDEs and their associated two-parameter semigroup have been studied in [7], where a detailed analysis of the law of the process is carried out, in [11] where the associated semigroup is examined, and in [14,30], where the authors introduce some interesting techniques to deal with the analysis of invariant measures and long-time behaviour of time-inhomogeneous processes. The work [7] assumes that the non-autonomous SDE is hypoelliptic, while in [30] a uniform ellipticity assumption is enforced. From a technical point of view, the results of Section 6 extend the approach of [30, Section 6.1] to the hypoelliptic setting. However the main difference between our results and the results in [30] is that here we highlight the fact that the process may admit several invariant measures and we characterize the basin of attraction of each of them. In this setting convergence to equilibrium is driven by the ODE component. We will indeed show that the basin of attraction of each invariant measure can be completely described by just looking at the behaviour of the solution of the ODE. Because the ODE is assumed to be one-dimensional and autonomous, it can only behave monotonically, so the analysis of the ODE and of the full problem is relatively intuitive in this setting (see Section 6 for details). In Section 7 we consider the general case of UFG processes for which the representation (10)-(11) is only local. While this case is substantially richer than the previous one, the fact that, locally, we can always represent the SDE (1) as a system of the form ODE+SDE, still means that there is some deterministic behaviour which is intrinsic to the UFG dynamics. It turns out that one is still able to single out the deterministic behaviour. Recalling the definition of the vector V (⊥) 0 , formula (9), we will show that the (N -dimensional) ODE plays, in this more general context, the same driving role that the ODE (11) had in the context of Section 6. Motivated by the above discussion, we introduce the process This process is non-autonomous and, as we will explain, it can be interpreted geometrically as being a projection of the process X t on an appropriate integral manifold of the distribution∆. We apply the techniques of Section 6 to the study of such a non-autonomous process, producing results on the long-time behaviour of Z t . We then relate the asymptotic behaviour of Z t to the asymptotic behaviour of X t . Notice that the procedure that we have just described is somewhat the reverse of the one that is traditionally used (and it is, to the best of our knowledge, new): given a nonautonomous system, the established methodology consists of increasing the dimension of the state space by adding time as an auxiliary variable, thereby reducing the given non-autonomous system to a (larger) autonomous one. Here we do the converse: by projecting the process on an appropriate manifold, we reduce to a (lower-dimensional) non-autonomous one, Z t , with the advantage that now the techniques of Section 6 can be adapted to prove statements on Z t . Once the latter process has been understood, we deduce results about the autonomous process X t from those shown for Z t .
From a probabilistic point of view it is clear that, in absence of the Hörmander condition, we cannot expect the process X t to have a density with respect to the Lebesgue measure. This is made explicit by the local representation (10)- (11), which also clarifies that it is the ODE component to be responsible for the lack of smoothness. Notice also that, in the coordinates (10)-(11), the vector V (⊥) 0 is given by V (⊥) 0 = (0, . . ., 0,Ŵ 0 ), i.e. it is precisely the vector driving the ODE behaviour (we have elaborated on this fact in Note 4.12). However in Section 8 we show that the law of the SDE (1) still has a density on an appropriate time-dependent submanifold, which can be explicitly described. In order to do so, we correct and then extend the results of [53].
One may also wish to point out that systems of the form "ODE+SDE" appear as diffusion limits of some Metropolis-Hastings type of algorithms, see e.g. [29]. It is noted in [29] that one may use the ODE as a way to monitor convergence. We conjecture that the lack of smoothness of UFG processes could be seen as a perk in the context of sampling. The authors intend to explore this fact in future work. Finally, we mention in passing that UFG processes play a fundamental role in the study of cubature methods, see [10] and references therein for a complete account on the matter.
1.3. Organisation of the paper. In Section 2 we introduce the standing notation for the remainder of the manuscript. To make the paper self-contained, in Section 3 we gather background definitions and notions. In particular Subsection 3.1 contains details of the UFG condition, while Subsection 3.2 covers basic definitions and standard results in differential geometry and (stochastic) control theory. In Section 4 we exploit the existing theory of distributions of non-constant rank to produce both global and local results about the SDE (1), under the UFG condition. In Section 4.1 we cover the global behaviour of the SDE, in Section 4.2 we study local properties. In Section 5 we introduce several results for UFG-diffusions. These results are quite general, in the sense that most of them valid under just the UFG condition. The following Section 6 can be read independently of the rest of the manuscript: in this section we describe the long-time behaviour of hypoelliptic SDEs of non-autonomous type. The class of SDEs considered in Section 6 is one for which the representation of the form "ODE+SDE" is global. This is the first section where we address the problem of studying the basin of attraction of different invariant measures. In Section 7 we instead study the long time behaviour of (1) in the general UFG case (in which the change of coordinates is only local). Finally, Section 8 is devoted to the study of the density of the process, via Malliavin calculus.

Notation
We will be interested in N -dimensional SDEs, of the form (1). The letter N will only be used to refer to the dimension of the state space. While examples of UFG diffusions can be found in any dimension, it is fair to say that the theory we develop in this paper is mostly interesting in dimension N ≥ 2, so we will make this a standing assumption which will hold unless otherwise stated in specific examples.
If x is a point in R N , we denote the j-th coordinate of x by x j , i.e. x = (x 1 , . . ., x N ) (this is coherent with (3)). We will often use a local change of coordinates, presented in Section 4.2. The change of coordinates will be given by a local diffeomorphism Φ : R N → R N and the new coordinates will typically be denoted by z, i.e. z = Φ(x). In the new coordinate system it will be of particular importance to distinguish the role of the first n coordinates of z from the others (n being an appropriate integer, n < N ). In particular, if N − n > 1, we will use the following notation z = (z 1 , . . ., z n , z n+1 , z n+2 , . . ., z N ) = (z, ζ, a), z ζ a where (z 1 , . . ., z n ) = z ∈ R n , z n+1 = ζ ∈ R and (z n+2 , . . ., z N ) = a ∈ R N −(n+1) . The last block of coordinates plays a role which is different from the role of the first two blocks, as it will be explained (the coordinates in the last block should more be intended as parameters). If N = n + 1 then simply z = (z, ζ). A similar reasoning holds for the vector fields appearing in (1): for any j ∈ {0, . . ., d}, V j = (V 1 j , . . ., V N j ) andṼ j will denote the vector V j , expressed in the new coordinate system z = Φ(x).
We will show that in the new coordinate system, one has where U j : R N → R n while W 0 is a real-valued function which depends only on the last two blocks of coordinates of z, i.e. W 0 : R N −n → R. Accordingly, if R N X t is the solution at time t of the SDE (1), then X j t denotes the j-th component of X t . We will sometimes want to stress the dependence of the solution X t on the initial datum; when this is the case, we will write X (x) t if X 0 = x. Finally, given a probability measure µ and a function f which is integrable with respect to µ, we shall define µ(f ) by We shall use the following function spaces throughout the paper. For any N ≥ 1 and closed set E ⊆ R N ; • We denote by C b (E) the space of all functions f : E → R which are continuous and bounded; this space will be endowed with the supremum norm. • We say that a real-valued function f is C ∞ if it has continuous derivatives of all orders. We denote by C ∞ V (R N ) the set of all bounded C ∞ functions f : R N → R which have bounded derivatives of all orders and such that for all n ∈ N,

Preliminaries and Assumptions
3.1. The UFG condition. Fix d ∈ N and let A be the set of all k-tuples, of any size k ≥ 1, of integers of the following form We emphasise that all k-tuples of any length k ≥ 1 are allowed in A, except the trivial one, α = (0) (however singletons α = (j) belongs to A if j ∈ {1, . . ., d}). We endow A with the product operation α * β := (α 1 , . . ., α h , β 1 , . . ., β ), for any α = (α 1 , . . ., α h ) and β = (β 1 , . . ., β ) in A. If α ∈ A, we define the length of α, denoted by α , to be the integer For any m ∈ N, m ≥ 1, we then introduce the sets Given a vector field (or, equivalently, a first order differential operator) V = (V 1 (x), V 2 (x), ..., V N (x)) on R N , we refer to the functions {V j (x)} 1≤j≤N as to the components or coefficients of the vector field. We say that a vector field is smooth or that it is C ∞ if all the components V j (x), j = 1, . . ., N , are C ∞ functions. Given two differential operators V and W , the commutator between V and W is defined as Let now {V i : i = 0, . . ., d} be a collection of vector fields on R N and let us define the following "hierarchy" of operators: This hierarchy is completely analogous to the one constructed in the Introduction, here we just need a more detailed notation. Note that if α = h then α * i = h + 1 if i ∈ {1, . . ., d} and α * i = h + 2 if i = 0. If α ∈ A is a multi-index of length h, with abuse of nomenclature we will say that V [α] is a differential operator of length h. We can then define the space R m to be the space containing all the operators of the above hierarchy, up to and including the operators of length m (but excluding V 0 ): for all k and all γ 1 , . . ., γ k ∈ A m . With this notation in place we can now introduce the definition that will be central in this paper.
Definition 3.1 (UFG Condition). Let {V i : i = 0, . . ., d} be a collection of smooth vector fields on R N and assume that the coefficients of such vector fields have bounded partial derivatives (of any order). We say that the fields {V i : i = 0, . . ., d} satisfy the UFG condition if there exists m ∈ N such that for any α ∈ A of the form Again we emphasize that the set of vector fields appearing in the linear combination on the right hand side of the above identity does not include V 0 . It may be useful to compare the UFG condition with the Hörmander condition (HC), the uniform Hörmander condition (UHC) and the Parabolic Hörmander condition (PHC), which we recall. The HC is satisfied if The PHC has been recalled in the introduction, see (PHC). We notice in passing that while the space R m is in general different from the space L m , 6 it is the case that ∪ j≥1 R j = ∪ j≥1 L j . The UHC (see [9]) is instead satisfied if ∃ ≥ 1 and κ > 0 : In the above each term of the sum is the scalar product between the vector V [α] (x) and the vector y ∈ R N . Notice that the UHC is the strongest of all these conditions, in the sense that However neither the HC nor the PHC imply the UFG condition (as one may, in general, need infinitely many fields to satisfy the PHC or the HC). We also note that while the various Hörmander conditions are imposed on an appropriate Lie Algebra, the UFG condition is rather a condition on the set of vectors {V [α] , α ∈ A m }, seen as a module over the ring C ∞ V .
Example 3.2. Consider one-dimensional geometric Brownian motion, that is, the solution of the following SDE: Here These vector fields satisfy the UFG condition with m = 1 however V 0 and V 1 vanish when x = 0 so the HC is not satisfied.
Example 3.3. Consider the following first order differential operators on R 2 Then {V 0 , V 1 } do not satisfy the Hörmander condition (e.g. there is always a degeneracy at x = 0) but they do satisfy the UFG condition with m = 4. If the role of the fields is exchanged, i.e. if we set Note 3.4. Because the functions ϕ α,β appearing in (17) belong to C ∞ V (R N ), if the UFG condition holds for some m ∈ N then it also holds for any ≥ m, ∈ N. In other words, if the UFG condition holds for some m in N then for any V [γ] with γ > m one has . For this reason it is appropriate to remark that in the remainder of the paper, when we assume that "the UFG condition is satisfied for some m", we mean the smallest such m.
We will consider diffusion semigroups {P t } t≥0 of the form (2); that is, we consider Markov semigroups associated with the stochastic dynamics (1). In particular, we will be interested in studying the semigroup P t when the vector fields {V 0 , V 1 , . . ., V d } satisfy the UFG condition.
As we have already mentioned, the UFG condition is strictly weaker than the uniform Hörmander condition. However one can still prove that, when such a condition is satisfied by the vector fields {V 0 , V 1 , . . ., V d } appearing in the generator (4), the semigroup P t still enjoys good smoothing properties: if f (x) is continuous then (P t f )(x) is differentiable (infinitely many times) in all the directions spanned by the vector fields contained in R m (we recall that the set R m is defined in (15)). See Appendix A.2 for more details.
When the semigroup P t is elliptic or hypoelliptic, several works have dealt with the study of the long and short time behaviour of the derivatives of the semigroup, for a review see [3,38]. To the best of our knowledge, the only work addressing the study of the long-time behaviour of the derivatives of UFG semigroups is [12]. In [12] the authors identify a sufficient condition for exponential decay of the derivatives of the solution of (5). To be more precise, they proved the following: suppose the vector fields {V 0 , V 1 , . . ., V d } satisfy the UFG condition and assume there exists λ 0 > 0 such that for all f sufficiently smooth and for every α ∈ A m we have If λ 0 is sufficiently large then, for every r > 0 and t 0 > 0, we may find a constant c t 0 ,r > 0 such that for any f ∈ C b (R), t ≥ t 0 and α ∈ A m we have sup B(0,r) for some λ > 0. In the above B(0, r) is the centered ball (of R N ) of radius r. Condition (18) was named the obtuse angle condition (OAC) in [12]. Here we will need a second order version of such a result, as well.
Lemma 3.5. Let P t be the semigroup associated with the SDE (1) and assume that the vector fields V 0 , . . ., V d satisfy the UFG condition. Suppose moreover that the following holds: there exists λ 0 > 0 such that for every x ∈ R N and for all α, β ∈ A m such that α = β and α, β / ∈ {1, . . ., d}. If λ 0 > 0 is large enough then, for any t 0 ∈ (0, 1) and any r > 0 there exists a constant c t 0 ,r > 0 such that, for some λ > 0, one has sup x∈B(0,r) for all α, β ∈ A m , all t > t 0 and for every f continuous and bounded.
Proof of Lemma 3.5. See Appendix B.
Example 3.6 (UFG condition and Obtuse Angle Condition for linear SDEs). Consider SDEs in R N of the form where A is a constant N × N matrix, B 1 t , . . . , B d t are one-dimensional standard Brownian motions and D, C 1 , . . . , C d ∈ R N are constant vectors. In this case V 0 (x) = Ax + D, V i (x) = C i , and Because [V i , V j ] = 0 for every i, j ∈ {1, . . ., d}, the only relevant commutators are those of the form V [(i,0,...,0)] , i.e. repeated commutators with V 0 . For simplicity, let α i,k be the (k + 1)-tuple such that α 1 i,k = i and α j i,k = 0 for j > 1; then It is now easy to show that, irrespective of the choice of A, D, C 1 , . . ., C d as above, the UFG condition is always satisfied by SDEs of the form (22 where Q is the overall diffusion matrix of (22), see e.g. [38]. As for the OAC (18), this is is satisfied if and only if there exists some λ > 0 such that for all f sufficiently smooth we have 3.2. Geometry. In this section we cover some basic notions from differential geometry and geometric control theory on which the rest of the paper relies. Further details can be found in the excellent references [26,54,55]. For the reader who is already familiar with this material, we point out that, among the results recalled in this section, Theorem 3.12 is possibly the one which will play the most important role in the remainder of the paper. Given a vector field V (x) on R N , we denote by e tV (x) the integral curve of V starting at t = 0 from x, i.e. the curve γ(t) : R → R N such that γ(0) = x andγ(t) = V (γ(t)) for all t ∈ R such that the curve is defined. In general, integral curves exist only locally. In this paper we consider only smooth, globally defined and globally Lipschitz vector fields (see Hypothesis 3.15), so integral curves actually exist for every t ∈ R. As already mentioned, a distribution ∆ on R N is a map that, to each point x ∈ R N , associates a linear subspace of the tangent space T x R N . Given a set D of smooth vector fields on R N , the distribution generated by D, denoted by ∆ D , is the map x → span{X(x) : X ∈ D}. Distributions generated by a set of smooth vector fields are usually referred to as smooth distributions. When we write ∆ D instead of just ∆ it is understood that we are considering smooth distributions rather than general distributions. As customary, we say that the vector field X on R N belongs to the distribution ∆ if X(x) ∈ ∆(x) for all x ∈ R N . The rank of ∆ at x is the dimension of the vector space ∆(x). A piecewise integral curve, γ, of vector fields in the set D is a curve of the form where X 1 , . . ., X h ∈ D (and they are not necessarily all distinct) M, is a connected integral manifold of ∆ which is maximal in the sense that every other connected integral manifold of ∆ that contains M coincides with M. Therefore, two MIMs either coincide or they are disjoint.
• ∆ is invariant under the vector field V if the Jacobian matrix J x (e tV x) maps ∆(x) into ∆(e tV x) for all x and for all t. 7 • Suppose ∆ is generated by the collection of vector fields D = {X 1 , . . ., X k }, i.e. ∆ = ∆ D .
Then two points x, y ∈ R N belong to the same orbit of ∆ D if there exists a curve γ : [a, b] → R N such that γ(a) = x, γ(b) = y and γ is a piecewise integral curve of vectors in D.
In general, the integral manifolds of a given distribution are "smaller" than the orbits; we refer the reader to [55] for a detailed explanation of this matter, see in particular [55,Eqn. (3.1)]. Here we just illustrate this fact with a simple but important classical example.
Example 3.8. In R 2 , consider the vector fields X = ∂ x and Y = ψ(x)∂ y where ψ(x) is a smooth function vanishing on the half-plane x ≤ 0. The orbit of the distribution generated by X and Y , ∆ X,Y , is the whole R 2 . That is, given any two points in R 2 there is a piecewise integral curve of {X, Y } which joins the two points. However the integral manifolds through points (x, y) with x ≤ 0 are one dimensional. Notice that the distribution in this example is involutive but it satisfies neither the Hörmander condition nor the UFG condition. More precisely, in the sense that whether we take X = V 0 and Y = V 1 or vicecersa, either ways the UFG condition is not satisfied (more precisely, in the language of Definition 3.10 below, the set {X, Y } is neither locally nor globally of finite type). The fact that {X, Y } don't satisfy the UFG condition can be either seen as a consequence of Theorem 3.12 below (if it did, the orbits would have to coincide with the integral 7 A useful criterion to check whether a distribution is invariant under a vector field will be given in Note 3.11. manifolds) or it can be shown with direct calculations (the problem arising on the line x = 0). For the reader's convenience this calculation is contained in the Appendix, see Lemma A.2.
We say that a distribution ∆ on R N satisfies the (maximal) integral manifolds property if through every point of R N there passes a (maximal) integral manifold of ∆. The following fundamental result, due to Sussman (see [55,Theorem 4.2]), completely characterizes the distributions enjoying the maximal integral manifolds property. In view of the equivalence of (a) and (b) above, when either property hold we just say that the smooth distribution is integrable. It is clear that in this case every integral manifold is a maximal integral manifold. Some standard facts about integrable distributions which are useful to bear in mind and that follow (easily) from what we have said so far: if ∆ D is integrable, then iii): the rank of the distribution is constant along the orbits (of ∆ D , which coincide with the integral manifolds of such a distribution).
The latter fact is a consequence of the fact that ∆ D is invariant under the vectors in D together with the observation that the maps e tV are diffeomorphisms for every fixed t ∈ R (hence the Jacobian matrix J x (e tV x), which maps the tangent space at x into the tangent space at e tV x, is always invertible). . Let D be a set of everywhere defined, smooth vector fields on R N and ∆ D be the associated distribution. The set D (as well as the distribution ∆ D ) is locally of finite type or locally finitely generated (LFG) if for every x ∈ R N there exist vector fields X 1 , . . ., X k such that ii): for every X ∈ D there exists a neighbourhood U of x and C ∞ functions ϕ i,j defined on U such that for all x ∈ U and every i ∈ {1, . . ., k}.
We emphasize that if ∆ D is LFG then the rank of ∆ D need not be constant. The next theorem gives a sufficient condition for integrability, which is easy to check in practice.  . Seen from a control-theoretical point of view, the above statement gives a global decomposition of the state space R N into sets reachable by piecewise integral curves of vector fields in D. To clarify this fact and provide some context, it is useful to compare it with the case where the HC holds. Start by noting that under the HC the Lie algebra generated by the vectors in D is required to have constant rank (and the rank is assumed to be precisely N at every point). The control-theoretical meaning of the HC is expressed by Chow's Theorem, see [4,6], (and indeed in control theory the HC is known as Chow's condition). Chow's theorem states that if the vectors {V 0 , . . ., V d } satisfy (HC) then any two points in R N are accessible or reachable in finite time from each other along integral curves of the vectors in D. That is, given any two points x, y ∈ R N , there exists a piecewise integral curve γ of vectors in D, and a time t > 0 such that γ(0) = x and γ(t) = y. This is not the case if we simply assume that D is a LFG set of vector fields. According to the above theorem, if D is LFG then, for every x ∈ R N , the set of states reachable from x in finite time coincides with the maximal integral manifold of ∆ D through x. Because the rank of the distribution is not constant, and in particular it needs not be N at any point, this implies that, in general, the orbits of ∆ D will be proper subsets of R N (as we have mentioned, they form a partition of R N ).
We conclude this subsection by recalling the following result, which will be used later on.
Lemma 3.14 ( [26, Theorem 2.1.9]). Let ∆ D be a smooth involutive distribution invariant under a vector field W . Suppose ∆ D is locally finitely generated. Let x 1 , x 2 be two points belonging to the same maximal integral manifold of ∆ D . Then, for all t ∈ R, the points e tW x 1 and e tW x 2 belong to the same maximal integral submanifold of ∆ D .
To clarify the above statement: under the asumptions of the lemma, if x 1 , x 2 belong to a given MIM of ∆ D , say M then e tW x 1 , e tW x 2 ∈M, whereM denotes another generic MIM of ∆ D . In generalM will be different from M (unless W belongs to ∆ D ). For example see Example 4.9.
3.3. Assumptions. Throughout the paper we will make the following standing assumptions.  (20), respectvely) is satisfied, we mean that it is satisfied for some λ 0 > 0 large enough so that the estimate (19) ( (21), respectively) follows.
Note 3.16. Assumption [SA.1] will be needed mostly to make sure that all the integral curves of the involved vector fields are well defined (and to guarantee well-posedness of the SDE (1)). However see Note 4.14 on this point.

Geometrical significance of the UFG condition and implications for the corresponding SDE
In this section we come to explain how the general results outlined in Section 3.2 applies to the study of the dynamics (1), assuming that the vector fields V 0 , . . ., V d satisfy the UFG condition. For clarity, we will compare with the case in which V 0 , . . ., V d satisfy the Hörmander condition. Subsection 4.1 contains global results, Subsection 4.2 is focussed on local results. 4.1. Global Results. Recalling the notation and nomenclature of Section 3.1 and motivated by Theorem 3.9, we introduce two distributions associated with the vector fields V 0 , . . ., V d ; such distributions will play a fundamental role in the analysis of the UFG-dynamics (1). Let •∆ 0 be the smallest distribution which contains the space span{V 0 , V 1 , . . ., V d } and is invariant under the vector fields {V 0 , V 1 , . . ., V d }; •∆ be the smallest distribution which contains the space span{V 1 , . . ., V d } and is invariant under the vector fields Let us denote by n = n(x) the rank of the distribution∆(x). Notice that n = n(x) is a function of the point x ∈ R N and, as such its value can vary from point to point. As Lemma 4.1 below demonstrates, if at some point x ∈ R N the rank of∆ is n, then the rank of∆ 0 is at most n + 1, hence the index n + 1 in the notation for∆ 0 . We will typically assume that n < N , where N is the dimension of the state space R N in which the vector fields V 0 , . . ., V d live, see Note 4.11 on this point. We stress that∆ may not contain the vector field V 0 itself (unless for example V 0 is a linear combination of V 1 , . . ., V d ). Lemma 4.1 below gives a simpler equivalent description of the distributions∆ and∆ 0 (which is the one we gave in the introduction).
Recall the decomposition (9), the definition of R m , given in (15), and set R m,0 := R m ∪ V 0 . Then In particular,∆  If the vector fields V 0 , V 1 , . . ., V d satisfies the UFG condition then the distributions∆ and ∆ 0 are locally of finite type. More precisely, the distributions span{R m } and span{R m ∪ V 0 } are globally of finite type. This can be checked by using Note 3.4 (and the fact that nested commutators can always be expressed as linear combinations of hierarchical commutators, see [6, page 11-12]).
Since the UFG condition implies that the sets R m and R m,0 are locally of finite type, we can apply Theorem 3.12 to the distributions given by the span of R m and R m,0 . By Lemma 4.1, the distributions∆ and∆ 0 coincide with span of R m and R m,0 respectively. As a corollary, we have the following proposition. Proposition 4.3. If the vector fields V 0 , V 1 , . . ., V d satisfy the UFG condition, then both∆ 0 and∆ enjoy the integral manifolds property. In particular the integral manifolds of∆ 0 coincide with the orbits of∆ 0 (and the same holds for the distribution∆).
We denote by S (S , respectively) a generic MIM of the distribution∆ (∆ 0 , respectively). Consistently, S x (S x , respectively) will denote the MIM of∆ (∆ 0 , respectively) through the point x ∈ R N . It is easy to see that for every x ∈ R N , S x ⊆ S x , so that S x is a disjoint union of integral manifolds of∆. Notice that n = n(x) is constant along the orbits S of∆.
It is important to observe that any deterministic dynamics started on a maximal integral manifold S of∆ 0 and following the integral curves of the fields V 0 , . . ., V d , will remain in S for any positive time (see Note 3.13). On the other hand, if X 0 = x is the initial datum of the stochastic dynamics (1) and X 0 ∈ S x , then X t ∈ S x for all t ≥ 0. This is a consequence of the Stroock and Varadhan support theorem, which we recall below, see [5] for more details.
for some ψ 1 , , . . ., ψ d : Informally, Theorem 4.4 says that the stochastic dynamics (1) will access in time t the (closure) of the set reachable in time t by the control problem (24), as we vary the controls ψ 1 , . . ., ψ d in a suitable set of functions.
Excursus 4.5. We would like to further elaborate on the comment started before Theorem 4.4. To this end, consider the following one-dimensional SDE: so that these fields satisfy both the HC and the PHC. According to Chow's theorem (see Note 3.13), if V 0 , V 1 satisfy the HC then any two points in R can be joined through integral curves of such fields. However, if we start the dynamics (25) at x ∈ [−π/2, π/2] then the solution X t never leaves the interval [−π/2, π/2]. This is not in contradiction to the statement of Chow's theorem. The behaviour of the stochastic dynamics (25) is related to the control problem (24). On the other hand, when we say that under the HC any two points in R N can be joined by integral curves of vectors in D, this is equivalent to saying that the set of points reachable from x by the control system is indeed the whole space R N (in the above the functions ψ 1 , , . . ., ψ d : [0, T ] → R are say piecewise constant controls). Clearly, the set of points accessible by (24) is a subset of the set of points accessible by (26). In our example, the support of the law of the solution to SDE (25) is given by the (closure of the) set of points reachable by the control problem On the other hand, Chow's theorem applied to the vector fields V 0 , V 1 refers to the problem Such a dynamics can indeed be stirred to access the whole real line, no matter where it is started.
The theory summarised in Subsection 3.2 describes completely the sets accessible by the control problem (26), which are precisely the orbits of the vector fields V 0 , . . ., V d . On the other hand, if we want to study the SDE (1) (under the UFG condition) then we are interested in understanding the behaviour of the control problem (24). Unfortunately, in full generality, one can only state the following (see [26,Section 2.2]). Lemma 4.6. With the notation and nomenclature introduced so far, let V 0 , . . ., V d be smooth vector fields on R N satisfying the UFG condition. Then the sets of points reachable from x by the control problem (24) is a subset of S x and it contains at least a non-empty open subset of S x .
Combining the above and Theorem 4.4 we obtain the following. Proposition 4.7. Consider the SDE (1) with initial datum X 0 = x and assume that the vector fields V 0 , . . ., V d satisfy the UFG condition. Then X t ∈ S x for every t ≥ 0. 9 Let us reiterate that Proposition 4.7 doesn't say that X (x) t will explore the whole set S x (that is, it doesn't imply irreducibility of the process on S x ), it simply means that the process X t will not leave such a set.

4.2.
Local considerations: an important change of coordinates. Let x ∈ R N be a regular point of a given distribution ∆, i.e. suppose there exists a neighbourhood of x where the dimension of ∆ is constant, say equal to n. If this is the case then, locally, there exist n linearly independent vector fields, {X 1 , . . ., X n } = D n , generating the distribution. Suppose furthermore that ∆ Dn is involutive and n < N (see Note 4.11) . For some small enough > 0 we can define the map Ψ : (− , ) N → R N as follows: where X 1 , . . ., X n are as above and X n+1 , . . ., X N are such that span{X 1 , . . ., X n , X n+1 , . . ., X N } = R N (at least locally). The map Ψ is, at least locally, a diffeomorphism on its image, so it admits an inverse, which we denote by Φ. Differentiating the obvious identity (Φ • Ψ)(t) = t, one obtains Let us make the above notation more explicit. The map Φ is a map from (opens sets of) where Φ i : R N → R . Therefore the i-th row of the matrix J x Φ is the gradient ∇Φ i . On the other hand, the j-th column of the matrix J t Ψ is the vector ∂Ψ ∂t j := { ∂Ψ 1 ∂t j , . . ., ∂Ψ N ∂t j } T . The first n columns of the Jacobian matrix (J t Ψ)(t) are linearly independent (because Ψ is a diffeomorphism) and, from the above, we have ∇Φ i · ∂Ψ ∂t j = 0 for all j = 1, . . ., n, i = n + 1, . . ., N.
By the involutivity of ∆ Dn the vectors { ∂Ψ ∂t j } n j=1 belong to ∆ Dn ; 10 moreover because they are linearly independent, they span ∆ Dn . Therefore the vectors ∇Φ i are orthogonal to every vector of ∆ Dn , i.e.
Now notice that Φ is (locally) invertible so it can be used as a (local) change of coordinates z = Φ(x). With these preliminaries in place, we have the following.
Proposition 4.8. Let ∆ be a smooth involutive distribution on R N and x 0 a regular point of ∆.
In particular, assume that there exists a neighbourhood of x 0 where the dimension of ∆ is n. Then there exists a change of coordinates Φ (defined locally) such that i): A vector fields V on R N belongs to ∆ if and only if in the coordinates defined by Φ, the last N − n components of V are zero; 11 9 We clarify again that the closure is intended to be in the Euclidean topology. ii): if ∆ is invariant under a vector field W then, in the coordinates defined by Φ, the last N − n components of W are functions independent of the first n coordinates. More explicitely, as per notation introduced in (12), let and letW be the representation of W in the new coordinates. Theñ Proof of Proposition 4.8. The proof is deferred to Appendix B.1.
We now want to apply Proposition 4.8 to the vector fields appearing in the SDE (1). We assume that such vector fields on R N satisfy the UFG condition at level m. Let ∆ n+1 and ∆ n be the distributions defined at the beginning of Section 4. We know that the rank of ∆ n+1 is constant along the orbits of ∆ n+1 (see comment before Definition 3.10). Let x ∈ R N and consider the orbit of ∆ n+1 through x. In view of Lemma 4.1, if we assume that V (⊥) 0 (x) = 0 then the rank of∆ 0 at x is exactly n + 1. Recall that N is fixed and it is the dimension of the state space R N , while n = n(x) is the dimension of the orbits of∆ and it is constant along each one of such orbits. Notice that ∆ n (and ∆ n+1 ) is also involutive by construction, so we can use it to apply Proposition 4.8.
With this in mind, let us describe the coordinate change. This is obtained by combining the following two steps. • Step one: because∆ 0 = span(R m , V 0 ) is the tangent space of an (n + 1)-dimensional submanifold of R N one can always locally express the vector fields V 0 , . . ., V d as i.e. the last N − (n + 1) coordinates of the vectorsṼ j are simply zero. • Step two: apply Proposition 4.8 using the distribution ∆ n (possibly only to the first n + 1 coordinates of the involved fields). Then, because V 1 , . . ., V d belong to ∆ n and V 0 is invariant for ∆ n , one obtains, in the new local coordinates, (and recalling the notation introduced in Section 2) where we keep the same notationṼ j for the new representation of the vector fields after this further change of coordinates. This shows that, in the new coordinates, the vector fields V 0 , . . ., V d take the form (13) - (14). We now want to express the SDE (1) in the new local coordinates. If X t is the original process, Z t is the process in the new coordinates. In particular where Z t ∈ R n contains the first n coordinates of Z t , ζ t is the (n + 1)-th coordinate of the process and a contains the remaining N − (n + 1) components (which do not change in time, see below). Putting everything together and using the convention (13) - (14), one obtains that, in the new coordinates, the SDE (1) with initial datum Z 0 = (z 0 , ζ 0 , a 0 ) is simply Notice that from the above one can also deduce that, in the new coordinates,Ṽ 0 (∆) = (U 0 , 0, . . ., 0) whileṼ 0 (⊥) = (0, . . ., 0, W 0 , 0, . . ., 0). Assuming for the moment that at the initial point x = X 0 the dimension of∆ 0 is exactly n + 1, the fact that the last N − (n + 1) components of the dynamics remain constant reflects the fact that, at least for a short enough time, the solution of the SDE remains in the integral submanifold of ∆ n+1 from which it started, coherently with Lemma 4.6 and Proposition 4.7.
If at the initial point the rank of ∆ n+1 is exactly N , i.e. n + 1 = N , then one simply has and this timeṼ 0 . In this simpler case it is clearer that we have locally reduced the SDE (1) to an ODE component, ζ t (which evolves independently of all the other components) and an (N − 1) -dimensional SDE. We emphasize that, because the change of coordinates is local, such a representation will hold only for small enough t.
Example 4.9 (UFG-Heisenberg). Consider the following dynamics in R 3 . This example was introduced in [12] and named the UFG-Heisenberg dynamics (as it comes from a modification of the Heisenberg group). This is already globally in the form ODE+SDE. The ODE for the first coordinate can be solved explicitly, giving X t = x 0 e −t . Therefore, if we start the dynamics at (x 0 , y 0 , z 0 ) with x 0 > 0 (x 0 < 0, respectively), then the system evolves (at least for finite time) in the semispace with positive x-coordinates (negative, respectively). If the initial datum is on the plane (0, y 0 , z 0 ) then the dynamics remains confined to such a plane for all subsequent times. This is coherent with the following: for the above set of vector fields, one has∆ 0 ((x, y, z)) R 3 if x > 0 or x < 0 and ∆ 0 ((x, y, z)) R 2 when x = 0. The distribution∆ 0 has three orbits, namely the sets As for the distribution∆, this spans R 2 at every point. Moreover, the orbit of∆ through the point (b, y, z) is the plane x = b. For this reason, when working on this example we will simply denote by S b the orbit through the point (b, y, z). In particular, notice that S 0 = S 0 .
where B t is a one-dimensional Brownian Motion. This system satisfies neither the HC nor the PHC, however the UFG condition is satisfied at level m = 1. Indeed we have For every (x, y) ∈ R 2 ,∆(x, y) = span{V 1 (x, y)}; except for the origin, the orbits of∆ are radial half-lines. That is, S (x,y) = (0, 0) if (x, y) = (0, 0) and S (x,y) = {(sx, sy), s > 0} otherwise. Indeed, S (x,y) coincides with the set of points accessible by the integral curves of V 1 , which can be found explicitly: In this example the local change of coordinates in the neighbourhood of (1, 0) is given by the diffeomorphism After such a change of coordinates, the SDE (33) -(34) can be expressed, locally, as In Figure 1 below we plot the evolution of C t , i.e. the solution of (33) - (34). From the plots it should be clear that (ζ t , Z t ) are just the polar coordinates of the point C t : ζ t represents the angle, which evolves deterministically with a simple anticlockwise motion, while Z t (or, to be more precise, exp(2Z t )) is the radius, which changes randomly according to the SDE (36).
Note 4.11. If the dimension n of∆ was equal to N for every x ∈ R N , this would imply that ∆(x) =∆ 0 (x) for every x ∈ R N . In particular, the Parabolic Hörmander Condition (PHC) would hold. This case is well studied in the literature and we do not wish to consider it here. For this reason many of the statements of this section are made under the assumption that n < N . We need to emphasize that it may happen that the two distributions coincide on a manifold (see Example 4.9, where the two distributions coincide on the plane x = 0) and it may also happen that they both have full rank N on a manifold, while they differ on other manifolds (see Example 4.13 below). The case that is not interesting to our purposes is the one in which they coincide and have full rank on the whole of R N . Most of our theorems do cover that case as well (unless otherwise explicitly stated); but they are not really conceived in that framework. , α ∈ A m . In particular, it may not be differentiable in the direction V 0 . In view of the decomposition (9) and of the change of coordinates presented in this section, this result is quite intuitive, as we explain.   Figure 1. The process (X t , Y t ) of Example 4.10, with initial condition (X 0 , Y 0 ) = (1, 0). The angle of rotation evolves deterministically in counterclockwise sense, while the radius changes randomly, according to (36).
is inherently linked to the deterministic part of the system, which clearly doesn't provide any smoothness. This also explains why, while there is no smoothness in the direction V 0 , the semigroup will always be smooth in the direction ∂ t − V 0 (to be more precise, in the direction ∂ t − V (⊥) 0 ), as solutions of the ODE are constant in this direction. Finally, the deterministic part of the dynamics is responsible for the lack of density (i.e. for the fact that the law of the process does not admit a density on R N ). It is useful to the purposes of this discussion to point out that the one-dimensional transport equation is an extreme example of UFG condition; that is, consider As is well known, the solution to such a PDE is just u(t, x) = f (x + t), hence no smoothing occurs in the space direction. However the solution is smooth in the direction (∂ t − ∂ x ) = ∂ t − V 0 , as it is constant in such a direction. Therefore, UFG diffusions include a vast range of behaviours, from smooth elliptic diffusions to deterministic equations.
Then∆(x, y) =∆ 0 (x, y) and they are both two-dimensional for every (x, y) ∈ A c while∆(x, y) =∆ 0 (x, y) for every (x, y) ∈ A, as on this set∆ 0 is one dimensional while∆(x, y) = 0 for every (x, y) ∈ A. After the change of coordinates the coefficients of the vector fields (in the new coordinates) may grow more than linearly, but they will still be smooth. Hence, in the neighbourhood in which they are defined, the vector fields will still be locally Lipschitz. The situation is more delicate with the vector V as well, see Example 7.10. Whenever this may cause issues, we will assume that V (⊥) 0 is at least such that the integral curve of V (⊥) 0 through a given point is unique and well defined (at least on given manifolds).
We conclude this section by stating a couple of technical lemmata which will be useful in the following. The statement of Lemma 4.15 would clearly not be true if S and S were two arbitrary sets, it only holds because of the particular structure of the integral manifolds of∆ and∆ 0 . As a side remark, notice that while S ⊆ S implies ∂S ⊆ ∂S , it is not the case, in general, that the boundary of S is the union of boundaries of orbits of∆, see Example 4.9.
Lemma 4.16. With the notation introduced so far, assume the vector fields V 0 , . . ., V d satisfy the UFG condition. Let x 0 ∈ R N and recall that x 0 belongs to exactly one integral manifold of∆ 0 , the manifold S x 0 . Consider the vector field V (⊥) 0 (defined in (9)) and assume such a vector field is smooth. Then either V Proof of Lemma 4.16. The proof is deferred to Appendix B.1.

Qualitative Results on UFG diffusions
In this section we study the behaviour of the diffusion X t (1) under the sole assumption that the vector fields V 0 , . . ., V d appearing in (1) satisfy the UFG condition. As observed also in [12,Note 4.3], under the sole UFG condition one cannot expect to make any quantitative deductions on the behaviour of the process X t . Neither can one expect the UFG condition itself to imply any results about existstence or uniqueness of invariant measures, as there are many elliptic diffusions that don't have an invariant measure (the simplest example being Brownian motion on R). In order to study invariant measures and decay to equilibrium we will have to make further assumptions. Nonetheless, the geometric considerations made in the previous sections allow us to prove several qualitative statements on the behaviour of the diffusion. The main results of this section are Proposition 5.1, Proposition 5.3 and Proposition 5.7. Collectively, these three results impart a lot of intuition about UFG dynamics and cointain a lot of useful information. After each one of these three statements we have inserted a note to comment on the meaning of these propositions, see Note 5.2, Note 5.4 and Note 5.8. The results of Section 6 and Section 7 heavily rely on the statements of this section.
Recall that we denote by S (S , respectively) a generic integral manifold of the distribution ∆ (∆ 0 , respectively). Consistently, S x (S x , respectively) denote the integral manifold of∆ (∆ 0 , respectively) through the point x ∈ R N . Proposition 5.1. Assume that the vector fields V 0 , V 1 , . . ., V d satisfy the UFG condition and let X t be the solution of the SDE (1). Let S be a maximal integral manifold of∆ 0 and let ∂S be the boundary of S , i.e. ∂S :=S \ S . Then the following holds: i): If ∂S is not empty, it is a union of integral submanifolds of∆ 0 ; ii): If X 0 = x ∈ ∂S then X t ∈ ∂S for all t > 0 (almost surely) .
Proof of Proposition 5.1. The proof is deferred to Appendix B.1.
Note 5.2. Let us explain the meaning and consequences of Proposition 5.1. Suppose we start the SDE (1) at x ∈ R N . Because the integral manifolds of∆ 0 partition R N , x belongs to one of such integral manifolds, the one which we denote by S x . As a consequence of Proposition 4.7 we know that the process will never leave the closure of S x ; however, if it started in the interior, it could in principle hit the boundary (which is a manifold whose dimension is lower than the dimension of S x ) and then come back to the interior. What we prove here is that this is not possible. Furthermore, because the boundary of S x is itself a union of integral manifolds of∆ 0 , one could repeat the previous reasoning once the process enters the boundary (if this is the case). As a result of iterating this line of thought, we have that, along the path of X (x) t , the rank of the distribution∆ 0 can only decrease (or stay the same). In other words, we have shown that for every Before stating the next result we recall that the vector V (⊥) 0 has been defined in (9).
We clarify that S 0 (x 0 ) = 0. This will be done by considering the control problem associated with the SDE (1) and by using Stroock and Varadhan Support Theorem. We postpone this part of the proof to Appendix B.1. 0 (x) = 0 for every x in S x 0 , x 0 being the starting point of the SDE (1). We already know by Proposition 4.7 that X (x 0 ) t will not leaveS x 0 , so that we can considerS x 0 to be the state space of the dynamics. As already observed before Proposition 4.3, every x ∈ S x 0 , belongs to exactly one orbit S of∆ and, moreover, the union of the manifolds {S x } x∈Sx 0 gives precisely S x 0 . In other words, the orbits of∆ that belong to S x 0 partition S x 0 . Furthermore, because V (⊥) 0 = 0 on S x 0 and the rank of∆ 0 is constant on S x 0 , one has (see Lemma 4.1) that if S x 0 has rank n + 1 then every orbit S x , x ∈ S x 0 , will be a manifold of dimension n. In particular, there is no x ∈ S x 0 such that S x = S x (so that the partition of S x 0 into orbits of the distribution∆ is not the trivial one). With this premise, it makes sense to ask the following question: while we know that the process will not leave S x 0 for every t ≥ 0, if we fix an arbitrary positive time t > 0, can we tell more precisely where, within S x 0 , X (x 0 ) t is? In particular, can we determine which submanifold S it belongs to, i.e. which element of the partition of S x 0 is visited at time t ≥ 0? The answer, given by Proposition 5.3, is the following: let y = e tV (⊥) 0 x 0 . Then, while x 0 ∈ S x 0 , X t ∈ S y . In other words, the vector V (⊥) 0 will make the SDE move from one submanifold of the partition (of S x 0 ) to another. Another question is whether it is possible that X t will visit one of such submanifolds twice or whether it is the case that, once one of these submanifolds has been visited, it will never be hit again. Example 5.6 below shows that the submanifolds of the partition can be visited an arbitrary number of times.
Example 5.5. Recall the UFG-Heisenberg SDE introduced in Example 4.9. In this case V (⊥) 0 = (−x, 0, 0) and, as we have already mentioned, To fix ideas, let (x 0 , y 0 ) = (1, 0) be the initial condition of the SDE; then the integral curve of V Therefore one can again explicitly verify that for every t > 0, (X ) belongs to S (cos(t),sin(t)) .
Proposition 5.7. With the notation introduced so far, assume the vector fields V 0 , . . ., V d satisfy the UFG condition and let . Then, for any invariant measure µ of the SDE (1) (should at least one exist), we have µ(E t ) = 0 for every t > 0. As a consequence, µ(E) = 0 as well.
Proof of Proposition 5.7. The proof is deferred to Appendix B.1.
Note 5.8. Informally, Proposition 5.7 says that any invariant measure (should at least one exist) gives zero weight to the set of points that, under the action of the dynamics prescribed by the SDE (1), leave in finite time the submanifold from which they start. That is, the set of points x such that X (x) t / ∈ S x for some time t > 0, has µ-measure zero. In view of Proposition 5.1 this result is intuitive: in general, if the dynamics leaves a set it can return infinitely many times to that set (when this happens the set is said to be recurrent). Because along the trajectories of X (x) t the rank of the distribution can only decrease, if the process X (x) t leaves the integral manifold S x from which it started, it will never return to it. The dynamics will therefore spend an infinite amount of time outside the manifold S x , so that the invariant measure, if it exists, it can only be supported outside such a manifold. In other words, the theorem says that an integral submanifold S is a recurrent set if and only if the process never leaves it (once it enters it). This argument constitutes an informal proof of the theorem. Notice also that this theorem doesn't say anything about say Geometric Brownian motion (see Example 3.2) or the UFG-Heisenberg process of Example 4.9, as such dynamics only leave the initial submanifold in infinite time; for any finite time they stay in the submanifold from which they started.
Lemma 5.9. Assume the vector fields V 0 , . . ., V d appearing in (1) satisfy the UFG condition and that the Obtuse Angle Condition (18) holds. Let P t be the semigroup defined in (2). Then, given a maximal integral submanifold S of∆, we have for all f ∈ C b (R N ) and x, y ∈ S.
Proof of Lemma 5.9. The proof is deferred to Appendix B.1 Proposition 5.10. Consider the assumptions and setting of the previous lemma and let S be a maximal integral manifold of∆. Then, among all the invariant measures µ of (1) (assuming at least one such measure exists), there exists at most one such that µ(S) = 1. Moreover, if such a measure exists, then it is ergodic (in the sense that P t 1 E = 1 E for some Borel set E, implies that µ(E) = 1 or 0) and for every x ∈ S and f ∈ C b (R N ) we have Proof of Proposition 5.10. The proof is deferred to Appendix B.1.
6. Long-time behaviour of UFG processes: the case of "non-autonomous hypoelliptic diffusions" In this section we set N = n + 1 and study stochastic dynamics in R N = R n+1 of the form In other words, we consider systems for which the representation of the form "ODE+SDE" (31)-(32) is global. 13 The above system consists of an n−dimensional process, Z t ∈ R n , satisfying an SDE, equation (42), which is coupled with a one-dimensional autonomous ODE, (43). As in previous sections, U j : R n × R → R n , j ∈ {0, . . ., d} and W 0 : R → R. The evolution of Z t depends on the evolution of ζ t , but the ODE solution ζ t evolves independently of the SDE. For the purposes of this paper, we don't think of ζ t as representing time, but rather as representing an additional space-coordinate. However notice that if W 0 ≡ 1 and ζ(0) = 0 then ζ t = t and we recover a standard time-inhomogeneous setting, i.e. in this case (42) becomes a general time-inhomogeneous SDE, namely 13 We are not claiming that this representation necessarily results from the change of coordinates presented in Section 4.2.
Going back to the representation of the form "ODE+SDE" (42)- (43) under consideration, if we denote by X t the process R N X t = (Z t , ζ t ), then X t is the solution of an autonomous SDE. The one-parameter semigroup associated to X t is, as usual, given by On the other hand one could consider the two-parameter semigroup associated with the non-autonomous process Z t alone. Indeed, if we solve the ODE for ζ t and substitute the solution back into the SDE for Z t , then we can simply consider equation (42) rather than the whole system. To be more precise, let us denote by ζ ζ t the solution at time t of (43) with initial datum ζ(0) = ζ. That is, ζ ζ t = e tW 0 ζ. Let also Z s,z,ζ t be the solution of the following SDE: The two-parameter semigroup associated with the above non-autonomous SDE is given by We emphasize that this two-parameter semigroup depends on ζ, i.e. on the initial datum of the ODE. When we do not wish to stress this dependence we may just write Q s,t . With this notation, one can equivalently rewrite the definion of P t as To make explicit the relation between the two-parameter semigroup Q ζ s,t and the one-parameter semigroup P t , fix s ∈ R and letζ = e sW 0 ζ. Notice that Z 0,z,ζ t = Z s,z,ζ t+s where the equality is intended in law. Therefore, for every f ∈ C b (R n+1 ; R), and z ∈ R n , we have (P t f )(z,ζ) = (Q s,s+t f (·, ζ ζ t+s ))(z) . (47) On the right hand side of the above we mean to say that the semigroup Q is acting on the function f (·, a) obtained by freezing the value of the last coordinate of the argument.
From now on, unless otherwise specified, we write Z t for Z 0,z,ζ 0 t . With this set up in place, we can start commenting on the long-time behaviour. Heuristically, if the solution of the ODE (43) is unbounded, then one can't expect the process X t to have an invariant measure (see Proposition 6.8)-though the process Z t may still admit an invariant measure. So we restrict to the case in which the solution of the ODE is bounded. However, because (43) is a one-dimensional time-homogeneous ODE, if ζ t is bounded then it can only either increase or decrease towards stable stationary points of the dynamics (a stationary point of the ODE (43) is a pointζ ∈ R such that W 0 (ζ) = 0). We emphasise that there may be many such points. For these reasons, we work under the assumption that ζ t admits a finite limit, i.e. we assume that the initial datum ζ 0 ∈ R is such that there exists a pointζ =ζ(ζ 0 ) ∈ R such that As customary, the notationζ =ζ(ζ 0 ) is to emphasise the fact that the limit point will depend on the initial datum (when we don't wish to stress such a dependence we just denote a stationary point of the ODE byζ). The dynamics (42)- (43) will, in general, admit several invariant measures. As pointed out in the introduction, when this is the case, it is typically extremely difficult to determine the basin of attraction of each invariant measure. However in the setting of this section the basin of attraction of a given invariant measure will only depend on the behaviour of the ODE. (In the next section we will show that, despite the fact that the representation of the form "ODE+SDE" is only local for generic UFG processes, it is still the case that we can relate in a simple way the initial datum to the invariant measure to which the process is converging). Given an initial datum ζ 0 for (43), letζ =ζ(ζ 0 ) be the corresponding limit point of the ODE dynamics, as in (48). Consider the SDE with associated semigroup We will assume that the dynamics (49)  , the semigroupQ t admits a unique invariant measure,μ =μ(ζ, ζ 0 ) (see Lemma 6.4). We emphasise that the asymptotic behaviour ofZ t is independent of the initial datumz, see Lemma 6.4. In view of (48), it is reasonable to guess that the asymptotic behaviour of Z t = Z 0,z,ζ 0 t is the same as the asymptotic behaviour ofZ t which is the solution of (49). This is the content of Theorem 6.5 below. Theorem 6.5 and Theorem 6.6 are the main results of this section; the former is concerned with the asymptotic behaviour of the semigroup Q s,t , the latter describes the related asymptotic behaviour of the semigroup P t . We set first the assumptions used in the rest of this section and we comment on their significance in Note 6.2.
Hypothesis 6.1. With the notation introduced so far, we will consider the following assumptions: • We start by remarking on the obvious fact that if X t = (Z t , ζ t ), where Z t , ζ t are as in (42)- (43), then X t solves an SDE of the form (1), with V 0 = (U 0 , W 0 ), V 1 = (U 1 , 0), . . . , V d = (U d , 0). • With the notation of Section 4 and Section 5, assumption [H.1] implies that the distribution ∆(x) is n-dimensional for every x ∈ R N , with n = N − 1. In the setting of this section, this is the maximum rank that the distribution∆ can have (as V 0 = (U 0 , W 0 ) is not contained in R m when W 0 = 0). In other words, for every x ∈ R N , the integral manifolds S x of ∆(x) are (N − 1)-dimensional manifolds. Because of the particularly simple structure of the SDE, such manifolds are just hyperplanes: for x = (z, ζ), S x = S (z,ζ) = {u ∈ R n+1 : u = (z, η), η = ζ, z ∈ R n }. In this explicit setting Proposition 5.3 is easy to check.
• To reconcile the present work with the framework of [7] and further elaborate on the meaning of Hypothesis [H.1], let us assume for the moment that W 0 ≡ 1 and that ζ(0) = 0, so that (42) becomes a standard time inhomogeneous SDE of the form (45). In this case the vector fields U 0 , , . . ., U d are R n -valued maps whose coefficients depend on time, i.e. (z, t) → U j (z, t) ∈ R n . For simplicity, let also n = 1. Then V 0 acts both on space and time, while V 1 , . . ., V d act on the space coordinate z only. That is, k−1 }, should be finitely generated and span R n z , for every (z, t) ∈ R n × R.
Let us now go back to the general representation of the form "ODE+SDE" (42) -(43), without assuming W 0 = 1. Recall that in this context the vector fields U j are R n -valued functions of n + 1 variables; that is, we view them as maps R n × R (z, ζ) → U j (z, ζ) ∈ R n . Set again n = 1 just for simplicity (everything we write in this comment would be true anyway). Then, as differential operators, U 0 , . . ., U j only act on the variable z, while W 0 only acts on the variable ζ, i.e. we have the correspondence One has If we calculate the second term on the RHS of the above along a solution ζ t of the ODE, we obtain W 0 (ζ t )(∂ ζ U j (z, ζ t ))∂ z = ∂ t (U j (z, ζ t ))∂ z . This suggests that we may evaluate the vector fields along the solution of the ODE and then think of them as functions of z and time t, rather than as functions of z and ζ, i.e. R n z × R t (z, t) → U j (z, ζ ζ t ) ∈ R n , j ∈ {0, . . ., d}. If we do so, then Hypothesis [H.1] can be equivalently rephrased as follows: the Lie algebra is finitely generated and spans R n z for every z ∈ R n and along any solutions ζ t of the ODE (43). 14 • As is well known, Hypothesis [H.2] is implied by a Lyapunov-type condition; namely, if there exists some non-negative function ϕ ∈ C 2 (R n ) with compact level sets and such that then [H.2] is satisfied. Here L t is the operator 14 Given an initial datum, the solution of the ODE is unique. When we say that this should hold along any solutions, we mean along all the solutions that one can obtain by starting from different initial data.
where ∇ = (∂ z 1 , . . ., ∂ zn ). does not imply the existence of a Lyapunov function, condition (51). Note 6.3. As already pointed out, if X t = (Z t , ζ t ), where Z t , ζ t are given by a representation of the form "ODE+SDE" (42)- (43), then X t solves an SDE of the form (1), (9)). We note in passing that in this case one has (This is not of much use at the moment, but it will help at the beginning of Section 7 to make a link between the setting of this section and the setting of the next). Therefore, while X t belongs to the hyperplane H ζt := {x ∈ R n+1 : x = (z, ζ t ), z ∈ R n } for each t ≥ 0, Z t remains, for every t ≥ 0, on the same hyperplane, namely the hyperplane H ζ 0 := {x ∈ R n+1 : x = (z, ζ 0 ), z ∈ R n } (which is precisely the manifold S x 0 = S (z 0 ,ζ 0 ) , see second bullet point in Note 6.2) for every t ≥ 0.
, for every z ∈ R n and every g ∈ C b (R n ) .
Proof of Lemma 6.4. This is completely standard and we omit it. See for example [15]. We just point out that the existence of the invariant measure comes from assumption [H.2] and the uniqueness is a consequence of Hypothesis [H.3] and Proposition 5.10 . Theorem 6.5. Let Hypothesis 6.1 hold. In particular, letζ =ζ(ζ 0 ) be a stationary point for the ODE (43) andμ be the invariant measure of the process (49). Then, for every s ≥ 0, lim t→∞ (Q ζ 0 s,t g)(z) = R n g(z)μ(dz), for every z ∈ R n and every g ∈ C b (R n ) .
The proof of this theorem can be found after the statement of Theorem 6.6. Theorem 6.5 describes the asymptotic behaviour of the process Z t . However, in this paper we are interested in the process X t . The long-time behaviour of the process X t is described by Theorem 6.6 below, which is just a straightforward consequence of Theorem 6.5. In order to state Theorem 6.6, we clarify the following: while Z t is a process in R n with invariant measure(s)μ =μ(ζ, ζ 0 ) supported on R n , X t is a process in R n+1 ; so, strictly speaking, any invariant measure of X t is a probability measure on R n+1 . However such a measure is supported on the n-dimensional hyperplane and it is just a trivial extension of the measureμ. That is, let µ = µ(ζ, ζ 0 ) be the measure on R n+1 such that µ(A) =μ(A ∩ Hζ) for every Borel set A ⊆ R n+1 .
We now introduce some definitions that will be needed for the proof of Theorem 6.5. A family {ν t } t≥0 of probability measures on R n is said to be an evolution system of measures for the twoparameter semigroup Q s,t if for all 0 ≤ s ≤ t and g ∈ C b (R n ) we have Let Q * s,t denote the adjoint of Q s,t over the space Then we can write (53) as Q * s,t ν s = ν t , for all 0 ≤ s ≤ t. Further background on evolution system of measures can be found in [14,30].
Proof of Theorem 6.5. The proof is in three steps.
• Step 1: We first construct a tight evolution system of measures, {ν t } t≥0 , for the semigroup Q s,t . To this end, take any point z 0 ∈ R n , define ν 0 = δ z 0 and then let ν t := Q * 0,t ν 0 . Now ν t is an evolution system of measures; indeed, Q * s,t ν s = Q * s,t Q * 0,s ν 0 = Q * 0,t ν 0 = ν t . (A more general construction of the evolution system is given in [30,Section 5]). To see that {ν t } t≥0 is tight, fix ε > 0; by Hypothesis [H.2] we may take a compact set K ε ⊂ R n such that q 0,z 0 t (K ε ) ≥ 1 − ε. By definition of ν t we then have • Step 2: Q s,t g(z) − ν t (g) converges to zero as t tends to ∞ for all s ≥ 0, z ∈ R n , g ∈ C b (R n ). We defer the proof of this fact to Lemma B.1. Since {ν t } t≥0 is tight, by Prokhorov's Theorem there exists a diverging sequence t such that ν t converges weakly to some probability measure µ 0 , as t tends to ∞.
• Step 3: Show that µ 0 =μ. We defer the proof of this equality to Lemma B.4. If µ 0 =μ, then ν t converges weakly toμ and the claim of the theorem follows; indeed, The first term converges to zero by Step 2 and the second term vanishes in the limit since ν t converges weakly toμ as t → ∞. Let p x t denote the measure defined by p x Proposition 6.8. If ζ ζ t → ∞ then the family of measures {p (z,ζ) t } t≥0 is not tight for any z ∈ R n (hence, by Prokhorov's Theorem, there is no probability measure µ such that P t f (z, ζ) → µ(f ), for all f ∈ C b (R n+1 )).
Proof of Proposition 6.8. Fix z ∈ R n and let x = (z, ζ) ∈ R n+1 . Assume by contradiction that {p x t } t≥0 is tight. Then, for any fixed ε > 0 there exists a compact set K ε ⊂ R n+1 such that p x t (K ε ) > 1 − ε for all t ≥ 0. Since K ε is compact we may take R sufficiently large such that K ε ⊆ R n × [−R, R]; then one has However ζ ζ t → ∞ so we may take t sufficiently large that |ζ ζ t | > R. This contradicts (54), hence p x t is not tight.
Example 6.9 (UFG-Grušin Plane). We give here a simple example of a process that satisfies the Obtuse Angle Condition but is not tight. Let d = 1, N = 2 and This corresponds to the SDE As a consequence of Proposition 6.8, the whole process (Z t , ζ t ) is not tight if k > 0. However in this case also the process Z t , seen as a non-autonomous one dimensional SDE, is not tight when k > 0. Indeed suppose by contradiction that ([H.2]) holds; then for any ε > 0 there exists R > 0 such that However if Z 0 = z then Z t is normally distributed with mean z and variance ζ 2 (e 2kt − 1)/k, so we may write where ξ is a one-dimensional standard normal random variable. Then we have which contradicts (55). Note that if k = 0 then Z t = √ 2ζB t which is not tight by a similar argument. However if k < 0 then the process Z t is tight. Indeed, assume that k = − < 0; to see that {q 0,z t } t≥0 is a tight family of measures, it is sufficient to apply a Lyapunov criterion and show that the function ϕ(z) = z 2 satisfies sup t Q 0,t ϕ(y) < ∞ (when k = − < 0). To prove the latter fact, observe that if (Z s , ζ 0 ) = (z, ζ) then, by (56), we get If k < 0 we see that X t = (Z t , ζ t ) converges in distribution.
Example 6.10. We conclude this section with an example which satisfies all the points in Hypothesis 6.1 in a non-trivial way, in the sense that it exhibits many invariant measures. Take k > 1 and consider the following SDE In this case Then we have Note that the function h(ζ) = sin(ζ)/ζ is bounded and smooth, when extended to the origin with the value h(0) = 1, so the UFG condition is satisfied at level m = 1. Moreover, and hence (18) is satisfied. To apply the results of Section 6 we must show that Hypothesis 6.1 holds. Note that the vector field V 1 is non-zero except when ζ = 0 therefore Hypothesis 6.1 [H.1] is satisfied everywhere that ζ = 0. To show that Hypothesis 6.1 [H.2] holds we consider a function ϕ ∈ C 2 (R) such that ϕ(z) = |z| for |z| > 1. Then, for |z| > 1, one has L t ϕ(z) = −kzϕ (z) + ζ 2 t ϕ (z) = −kzsign(z) = −kϕ(z). Therefore ϕ is a Lyapunov function so by Note 6.2 we have that the measures {q 0,z t : t ≥ 0} are tight for any z ∈ R and Hypothesis 6.1 [H.2] is satisfied. We also have that ζ t converges for any ζ ∈ R and the limit ζ is given by for ζ ∈ ((2n − 1)π, (2n + 1)π) for some n ∈ Z \ {0} (2n + 1)π for ζ = (2n + 1)π for some n ∈ Z 0 for ζ ∈ (−π, π).
Hence for ζ / ∈ (−π, π) we may apply Theorem 6.6 to obtain that X t = (Z t , ζ t ) converges in distribution to a random variable which is distributed according to the unique invariant measure defined on the line R × {ζ(ζ 0 )}. Moreover, for ζ = nπ for some n ∈ Z \ {0} we see that ζ t = ζ and Z t satisfies the Ornstein Uhlenbeck SDE In particular, in this case Z t has a unique invariant measure and this is given by a Gaussian measure with mean 0 and variance ζ 2 /k. Therefore for any n ∈ Z \ {0} and ζ ∈ ((2n − 1)π, (2n + 1)π) we have that X t converges in distribution to ( 2nπ √ k ξ, 2nπ), where ξ is a one-dimensional standard normal random variable. 15 15 Since Zt satisfies a non-autonomous Ornstein Uhlenbeck equation one can also study its asymptotic behaviour more directly, see e.g. [20].

Long-time behaviour of UFG diffusions: general case
In the previous section we investigated the case in which the representation of the form "ODE+SDE" is global. In this section we study the general UFG-case, in which such a representation is, in general, only local. That is, we finally address the full problem of analysing the asymptotic behaviour of (1), assuming that the vector fields V 0 , . . ., V d satisfy the UFG condition (see Definition 3.1). This case is substantially richer than the one considered in Section 6; however the fact that, locally, we can always represent the SDE (1) as a system of the form "ODE+SDE", still means that we should be able to identify a suitable ODE which drives the dynamics. We will demonstrate that this is indeed the case and that such an ODE is the integral curve of the vector field V  (57) substantially reduces to (43). The previous sentence is correct for less than observing that (57) is an N -dimensional ODE, while (43) is a one-dimensional curve. We keep using the notation ζ t for both curves only to emphasize the analogy; however, while the one-dimensional autonomous nature of the ODE (43) implies that its solution has a limit, the zoology of possible behaviours for the curve (57) is much more varied. In this paper we only analyse the case in which the curve (57) converges to a limit and in future work we will treat the case when (43) does not have a solution. However, roughly speaking, in Theorem 7.11, we prove that a necessary condition for the SDE (1) to have an invariant measure is that the ODE (57) should admit one as well (notice that if the curve (57) converges to a limit pointx, then it admits the Dirac measure δx as invariant measure).
As anticipated in the introduction, the above discussion motivates introducing the process Clearly Z 0 = x 0 , so Z t and X (x 0 ) t start from the same point. This process is time-inhomogeneous (as we show at the beginning of Section 7.2) and it will have a central role in what follows, hence further comments on the definition (58) are in order: • To continue drawing the useful parallel with Section 6, notice that this process plays in this context an analogous role to the one that Z t (solution of (42)) has in Section 6, see Note 6.3. • Let us recall that if X 0 ∈ S x 0 then X t ∈ S x 0 for every t ≥ 0 (see Proposition 4.7); more precisely, for every t ≥ 0 X t belongs to the integral submanifold S 5 for more precise comments on this). Therefore Z t lives on the manifold S x 0 , for every t ≥ 0. So, in the end, while X t takes values in S x 0 , Z t takes values in S x 0 ⊆ S x 0 . One can informally think of Z t as being a "projection" of X t on the submanifold S x 0 ⊆ S x 0 , see again Note 6.3. • Finally, on a small technical point, as we have already observed in Note 4.14, V (⊥) 0 may not be uniformly Lipschitz. However, to avoid problems of well posedness and uniqueness, throughout this section we assume that V (⊥) 0 is indeed Lipschitz. We will show that the time-inhomogeneous process Z t can be studied by means of slight modifications of the approach used in Section 6 to study the process (42). Therefore the strategy (and one of the the main novelties) of this section is to use the auxiliary time-inhomogeneous process Z t in order to make deductions on the behaviour of the time-homogeneous process X t . We carry out this programme in Section 7.2 below. Before moving on, we give a simple example which demonstrates that Z t ∈ S x 0 for every t ≥ 0 and, in Section 7.1, we gather further preliminary results on the process Z t . Example 7.1 (Random Circles continued). Consider again Example 4.10, in the case in which the initial datum is (x 0 , y 0 ) = (1, 0) . Using (37) and (38)-(39), we have 1 0 In particular, Z t takes values in the positive half-line, which is precisely S (1,0) = S (x 0 ,y 0 ) . 7.1. The auxiliary process Z t and its associated two-parameter semigroup. By differentiating (58) we see that Z t satisfies the following SDE where, as customary, we have set (Ad tV Y )(x) := (J x e −tV )(e tV (x)) · Y (e tV x), for any two smooth vector fields V and Y . By using (9), the elementary property Ad tV V = V and introducing the notation V 0,t := Ad tV (⊥) we conclude that Z t satisfies the following SDE with time-dependent coefficients: As usual, we denote by P t the one parameter semigroup associated with X t ; the two-parameter semigroup associated with Z t is instead given by The semigroups Q s,t and P t are related as follows: We stress that {Q s,t } 0≤s≤t is defined on S x 0 (as per Hypothesis 7.4 below). In (61) we consider functions g which are continuous up to and including the boundary of S x 0 for purely technical reasons (see proof of Proposition 7.3).
In Proposition 7.3 we make some clarifications on the smoothing properties of the semigroup Q s,t .
To state such a lemma, we need to properly formulate some preliminary facts. Consider the following "hierarchy" of operators: (59)) For each α ∈ A we can view the vector field (z, t) → V [α],t (z) as a vector field on R N , the coefficients of which depend on time or as a vector field on R N × R. We can define the UFG condition for vector fields in R N × R in an analogous way to Definition 3.
Proof of Proposition 7.2. The proof is deferred to Appendix B.3.

Recall from Section 3 that the map
. In Proposition 7.3 we show that for each fixed s < t the map z ∈ S x 0 → Q s,t g(z) is also smooth in the directions V [α],s for any g ∈ C b (S x 0 ) and α ∈ A m . A key observation to understand the statement of Proposition 7.3 is the following one: V ∈∆ and∆ is invariant under the vector field W ⇒ Ad tW V ∈∆. 16 (63) In particular, V j,t ∈∆ for every j ∈ {0, . . ., d}. is uniformly Lipschitz. Then, for any g ∈ C b (S x 0 ), the map (z, s) → Q s,t g(z) is differentiable in the time variable s and in the spatial directions V [α],s for any z ∈ S x 0 , t > s, α ∈ A m . Moreover Q s,t g(z) satisfies the equation for any z ∈ S x 0 , s < t.
Here L s is the differential operator defined as for ψ : S x 0 → R sufficiently smooth.
Proof of Proposition 7.3. The proof is deferred to Appendix B.3. 16 Indeed, by the definition of invariance (see Definition 3.7), we have that Jxe tW (x) maps∆(x) to∆(e tW (x)).

Convergence to Equilibria.
We now turn to the asymptotic behaviour of the process Z t . As we have already stated, we will concentrate on the case in which the solution of the ODE (57) converges. Let us define the map Here Dom(W ∞ ) is the set of all points x ∈ R N such that the integral curve e tV (⊥) 0 (x) converges to a finite limit as t tends to ∞.
Note 7.5. Some comments on the above assumptions, in the order in which they are stated.
• As a general premise, observe that, for every fixed t ≥ 0, X t ∈ S x 0 if and only if Z t ∈ S x 0 .
Indeed, X t = e tV (⊥) 0 Z t so if Z t is in S x 0 then in particular it is in S x 0 and X t is just obtained by moving along an integral curve of V (⊥) 0 ; hence, by construction of the manifold S x 0 , X t is still in S x 0 . The validity of the reverse implication can be argued similarly (using Lemma 4.15 and Proposition 5.3 as well). As a consequence, if Z t doesn't hit the boundary of S x 0 in finite time then X t doesn't hit the boundary of S x 0 in finite time. • Hypothesis 7.4 [A.4] implies that Z t ∈ S x 0 almost surely, for every t ≥ 0, i.e. it implies that Z t doesn't hit the boundary of S x 0 in finite time. Indeed, assume by contradiction that there exists t 0 > 0 such that P(Z t 0 ∈ ∂S x 0 ) =: ε > 0. Recall ∂S x 0 := S x 0 \ S x 0 . By the previous bullet point if Z t 0 belongs to ∂S x 0 then X t 0 ∈ ∂S x 0 . By Proposition 5.1 we then have that X t is in the boundary of S x 0 for any t > t 0 . That is, We know from [A.3] that, given ε as in the above, there exists a compact set K ε/2 ⊆ S x 0 such that P(Z t ∈ K /2 ) = q t (K ε/2 ) ≥ 1 − ε/2 for every t ≥ 0. Now using that S x 0 and where in the first inequality we have used the observation in the first bullet point of this note and (K ε/2 ) C denotes complement in S x 0 . Hence ε = 0, i.e. Z t belongs to S x 0 almost surely.  to vanish on the whole Sx.
• If we don't make any assumptions on the map W ∞ , when we look at the set W ∞ (S x 0 ), it may occur that this is not a connected set and, even if it were connected, it may be contained in more than one submanifold of∆ (see Example 6.9). If we assume that W ∞ is continuous, because S x 0 is connected then also W ∞ (S x 0 ) is; for simplicity, we are also explicitly assuming that W ∞ (S x 0 ) is contained in just one submanifold of∆, the manifold S x . It could also occur that on the limit manifold W ∞ (S x 0 ) we have that V (⊥) 0 (x) = 0 for every x ∈ W ∞ (S x 0 ), see for instance Example 6.10. If this is the case, then one can take such a manifold as starting manifold and apply the theory that we explain here by taking starting points on this manifold; i.e. one can sort of "repeat the procedure" illustrated here by starting the dynamics again on that manifold. So, in conclusion one just needs to study the case in which V (⊥) 0 (x) = 0 for every x ∈ W ∞ (S x 0 ). Again for simplicity, we assume V (⊥) 0 (x) = 0 for every x ∈ S x .
• Finally, notice that if W ∞ is well defined and continuous on S x 0 then W ∞ is also a welldefined and continuous map from S x 0 to R N . We show this fact in Lemma A.6 , contained in Appendix A.3. Notice also that W ∞ is the identity when restricted to S x , hence W ∞ is always well defined and continuous on S x . What we are requiring with the last point of Hypothesis 7.4 is that the map should be continuous not only on each one of the manifolds S x 0 , S x 0 and S x , but also that it should be continuous on the union of these three sets. The reason why we need continuity also on the closure of S x 0 is, again, technical, see proof of Lemma B.6 Before we consider the behaviour of X (x) t in the case when e tV (⊥) 0 (x) is convergent, we must first consider the trivial case, i.e. the behaviour of the process when we start it from the "equilibrium manifold" S x , where V Moreover the convergence is uniform on compact subsets of S; that is, for every compact set K ⊆ S and every f ∈ C b (R N ) we have Proof of Proposition 7.6. The proof is deferred to Appendix B.3 .
Note 7.7. The assumption that V (⊥) 0 = 0 on S implies that, if x ∈ S, then the map t → P t f (x) is differentiable, for any f ∈ C b (R N ). Indeed, as explained in the Introduction, in general we have that P t f is differentiable in the direction ∂ t −V 0 and in the directions contained in∆ (see Appendix A.2) and satisfies for all x ∈ S and hence P t f is also differentiable in the direction V 0 on S. Therefore we have that P t is also differentiable in time, i.e. as a map t → P t f , and satisfies = 0 on S x so we can apply Proposition 7.6 to the manifold S x and throughout the rest of the section we shall denote by µ S x the invariant measure supported on S x such that (65) holds for all x ∈ S x . Such a measure exists and is unique by Proposition 7.6. Similarly to what we did in Section 6, equation (52), we shall extend this to a measure µ S x defined on R N by setting For any x ∈ R N , let I 0 (x) = {x ∈ R N : W ∞ (S x ) ⊆ S x }. The set I 0 is contained within the basin of attraction for the measure µ S x . Indeed, Theorem 6.5 below shows that for all x ∈ I 0 (x) we have that P t f (x) converges to µ S x (f ), for all f ∈ C b (R N ).
Proof of Theorem 7.8. Throughout the proof we fix an arbitrary point x 0 ∈ I 0 . The proof is split into 3 steps.
• Step 1: We first construct a tight evolution system of measures, {ν t } t≥0 , for the semigroup {Q s,t } 0≤s≤t which are supported on S x 0 . This can be done be acting analogously to what we have done in Step 1 of the proof of Theorem 6.5; in particular we may define ν t := Q * 0,t δ x 0 . 18 Note that ν t (S x 0 ) = 1; indeed by Note 7.5 (second bullet point) we have that Z t ∈ S x 0 almost surely when Z 0 = x 0 ; hence 18 Note that using the same argument we could define νt = Q * 0,t δx 0 .

Moreover, analogously to
Step 2 in the proof of Theorem 6.5, since the family {ν t } t is tight, there exists a diverging sequence {t } such that ν t converges weakly to some probability measure µ 0 as t tends to ∞. • Step 2: By construction, the measure µ 0 is a measure on S x 0 ; we then consider the probability measure µ 0 • (W ∞ ) −1 . 19 The latter measure is supported on S x . One needs to show that Recall that µ S x is the restriction of the measure µ S x to S x . The proof of this fact is deferred to Lemma B.7. Note that this is one of the places where we use that x 0 ∈ I 0 (x). This implies that ν t converges weakly to µ S x • W ∞ as t tends to ∞. Furthermore, by Hypothesis [A.3] we can take a sequence {t } such that t ∞ and p x 0 t converges weakly to some probability measure ν x 0 .
• Step 3: We show that ν x 0 is supported on S x and, when we restrict it to S x , we have ν x 0 | S x = µ 0 • (W ∞ ) −1 . Lemma B.8 is devoted to proving this fact. Therefore, by Step 2 and the definition of µ S x we have that This implies that p x 0 t converges weakly to µ S x as t tends to ∞ for any x ∈ S x 0 , that is, for every We now give a one dimensional example which satisfies all the assumptions we have made in this section. In particular, this example fits our framework in a non-trivial way as it exhibits many invariant measures. Example 7.9. Consider the SDE is a one-dimensional Wiener process. In this case V 0 = sin(z)∂ z , V 1 = (1 − cos(z))∂ z and we have Therefore the vector fields V 0 , V 1 satisfy the UFG condition; the above also shows that the obtuse angle condition (18) is satisfied, with λ 0 = 1. Moreover, it is easy to show that the function (V 1 P t f )(x) decays exponentially fast in time, i.e. λ 0 is big enough that (18) implies an estimate of the type (19) for the fields V 1 . Because the coefficients of the equation are bounded the estimate is uniform on the whole real line, see [12, Proposition 3.1, Proposition 3.4 and Theorem 4.2] alternatively by a direct calculation, see [12,Example 4.4]. Since V 0 and V 1 both vanish whenever z ∈ 2πZ we have that the point measures δ 2nπ are invariant measures for any n ∈ Z. However there also exist invariant measures supported on (2nπ, 2(n + 1)π) for any n ∈ Z. Indeed let where C is the normalization constant and 1 (2nπ,2(n+1)π) (z) is the characteristic function of the interval [2nπ, 2(n + 1)π)]. By direct calculation one can verify that, for every n ∈ Z, ρ n (z) satisfies the stationary Fokker-Planck equation L * ρ n = 0, where L * ρ n (z) = −∂ z (sin(z)ρ n (z)) + ∂ z [(1 − cos(z))∂ z ((1 − cos(z))ρ n (z))] .
Notice that if X 0 ∈ [2nπ, 2(n + 1)π] (for some fixed n ∈ Z) then X t ∈ [2nπ, 2(n + 1)π] for every t ≥ 0. However, even if we restrict to one of the intervals [2nπ, 2(n + 1)π], the process still admits three invariant measures on each one of such intervals. Example 7.10 (Example 6.10 continued). Recall that in this example V 0 = sin ζ∂ ζ − kz∂ z and We conclude this section by stating and proving Theorem 7.11 below. In order to state it, let us define the following equivalence relation on R N : As customary, we denote by [x] the equivalence class of x under the equivalence relation ∼. Note that by Lemma 3.14, if x ∼ y then also e tV (⊥) 0 x ∼ e tV (⊥) 0 y, therefore the flow map is well defined. Let now q be the map q : is a Borel set of R N }, then q is a measurable map. If µ is a probability measure on R N , we define the pullback measureμ on R N / ∼ asμ(E) = µ(q −1 (E)) for all E ⊆ R N / ∼.
Theorem 7.11. Consider the SDE (1) and the associated semigroup P t and assume that the vector fields V 0 , . . ., V d satisfy the UFG condition. If µ is an invariant measure for P t , thenμ is an invariant measure for the flow map (66).
Proof of Theorem 7.11. Denote by B b (R N / ∼; R) to be the set of all bounded and measurable functions f : i.e. f • q is a bounded and measurable function mapping from R N to R N . By the definition of invariant measure, we have Let us now look more closely at the expected value on the right hand side of the above: for any bounded and measurable function h we can write where the second equality follows from Proposition 5.3. Now the second term in the above vanishes by Proposition 5.7 (easy to prove for positive h, if h is not positive just split into positive and negative part). Indeed if X x t ∈ ∂S e tV (⊥) 0 x then X x t ∈ ∂S x (see Lemma 4.15 for a proof of this fact).
Putting everything together we can write where the penultimate equality follows from the fact that the object on the second line is completely deterministic and the last equality holds by the definition of the measureμ. This concludes the proof.

Existence of a density
Analogously to what we did for the study of the long-time behaviour, we split this section into two subsections. That is, in Section 8.1 we consider the setting of Section 6 and study SDEs of the form (42)- (44). In Section 8.2 we consider the general UFG-case. This section makes use of several notions from Malliavin calculus, we will recall only some basic facts and refer the reader to [44] for more detailed background material.
Let D k,p ⊆ L p (Ω) denote the Malliavin Sobolev space, that is the domain of the kth order Malliavin derivative in the space L p (Ω). We also define the space We shall denote by D the dual space of D, that is the space of all continuous linear maps from D to R. Let us recall the following lemma, which is quoted from [44, Theorem 2.
for every r ≤ t.
Here we use the notation D k to denote the Malliavin derivative operator with respect to the Brownian motion B k . 20 Define the Malliavin matrix M t = (M ij t ) N i,j=1 to be Again by [44, Section 2.3] we can rewrite the Malliavin matrix in terms of the Jacobian matrix J t := ∂Xt ∂x 0 , details can be found in [44,Section 2.3]. There it is also shown that J t is an invertible 20 Note that D k denotes the 1st order Malliavin derivative with respect to the kth Brownian motion and is not to be confused with the kth-order Malliavin derivative. matrix and that the following holds where the matrix C t is the reduced Malliavin covariance matrix defined as 8.1. Existence of a density on a suitable hyperplane. In this section we consider the SDE (42)- (44). We shall also assume Hypothesis 6.1 [H.1], which states that the set of vector fields In this setting it is clear that the law of X t = (Z t , ζ t ) does not admit a density with respect to Lebesgue measure on R n+1 ; indeed for each fixed t, ζ t is a deterministic point which implies that P x (X t ∈ R n × {ζ t }) = 1 while R n × {ζ t } is a null set with respect to Lebesgue measure on R n+1 . We prove that for every fixed t ≥ 0 the law of the random variable Z t admits a density with respect to Lebesgue measure on R n . In terms of the process X t this implies that the law of X t admits (for every fixed t ≥ 0) a density with respect to the Lebesgue measure on the hyperplane H ζt := {x = (z, ζ) : ζ = ζ t }. Moreover, since from Section 6 X t ∈ H ζt almost surely, we have that H ζt is the maximal manifold such that X t admits a density with respect to the volume element on such a manifold. 21 To prove that the law of Z t admits a density we shall follow the same strategy of [44, Section 2.3]. Note that by Hypothesis 3.15 and Lemma 8.1 for each t ≥ 0 and i ∈ {1, . . . , n} we have that Z i t and ζ t belong to D 1,p for all p ≥ 1. First we note that the solution X t = (Z t , ζ t ) admits a Malliavin derivative. where the matrix M t is the Malliavin matrix corresponding to Z t .
Proof of Lemma 8.2. The proof is deferred to Appendix B.4.
In [44] it is shown that if the Malliavin matrix is invertible then the law of X t admits a density on R n+1 . We can see from (68) that the matrix is not invertible; however we show that the Malliavin matrix M t corresponding to Z t is invertible almost surely and hence the law of Z t admits a density on R n , for every fixed t > 0.
where C t is a random n × n symmetric matrix. Moreover, if we assume Hypothesis 6.1 [H.1] holds then C t is invertible P-almost surely.
Proof of Proposition 8.3. The proof is deferred to Appendix B.4. 21 Throughout our discussion we need to fix a canonical reference measure on the manifold. Here, and subsequently, when we refer to the volume element on a submanifold M ⊂ R N we mean the measure on M which is determined from the Riemannian density associated to the induced Riemannian metric on M .  (42). Then the law of Z t is absolutely continuous with respect to the Lebesgue measure on R n .
Proof of Theorem 8.4. Note the Malliavin matrix corresponding to Z t is M t which is invertible, indeed M t = J t C t J T t and C t is invertible by Proposition 8.3 therefore M t is invertible since the product of invertible matrices is invertible. By [44, Theorem 2.1.2] we have that the law of Z t is absolutely continuous with respect to Lebesgue measure on R n , for each t > 0.

8.2.
Existence of a density on integral submanifolds. We now return to studying the general UFG-case. As in the previous section we cannot expect that the law of X t will in general admit a density with respect to Lesbegue measure on R N and we will instead show that the law of X t admits a density with respect to the volume element on a suitable manifold. Indeed, we shall show that the law of X t admits a density with respect to the volume element on S , that is X , almost surely. In the first and second comment in Note 7.5 it is shown that under Hypothesis 7.4 [A.4] implies that X t cannot hit the boundary of the maximal integral submanifold.
Recall from Section 7 the process {Z t } t defined by (58). Since e −tV (⊥) 0 is a diffeomorphism the law of X t admits a density with respect to the volume element on S if and only if the law of Z t admits a density with respect to the volume element on S x 0 . Let V [α],t be defined as in (62), then recall that the process {Z t } t≥0 satisfies the SDE (60). Now we wish to apply [53,Theorem 3.4] to show that the law of {Z t } t≥0 admits a density with respect to the volume measure on S x 0 . However, as noted in [7], there is a mistake in the proof of [53,Theorem 3.4], in particular the form of the Hörmander condition given by [53, Assumption (H)] is not sufficient for the conclusions of [53, Theorem 3.4] to hold. More precisely, they rely upon [18, Theorem 1.1.3] to show that [53, Assumption (H)] implies a suitable integration by parts formula, which is shown to be incorrect by [7]. However under our conditions there is an integration by parts formula as shown in [42,Section 3]. Therefore we may use the strategy given in [53] and the results of [42] to prove that the law of Z t admits a density with respect to the volume measure on S x 0 .
A vital tool for this argument is the integration by parts formula proved in [42, Theorem 3.10]; namely, for Φ ∈ D and α 1 , . . . , α M ∈ A m we have (69) Let us denote by E(S x 0 ) the space of all distributions on S x 0 with compact support. Recall we can consider a smooth function f to be a distribution F f by setting where λ Sx 0 denotes the volume measure on S x 0 .
Lemma 8.5. Assume that Z t satisfies (69). Then there exists a map Ψ t : E(S x 0 ) → D with the following properties (2) The map Ψ t is continuous as a map from E(S x 0 ) to D . Proposition 8.6. Fix t > 0 and let Z t be such that the map Ψ t is well defined for every distribution f . Then let I be some open set (1) If I s → F s is continuous (continuously differentiable), then I s → Ψ t (F s ) is continuous (resp. continuously differentiable). In particular, for every G ∈ D the map I s → Ψ t (F s ), G is continuous and respectively continuously differentiable and (2) If I s → F s is continuous then for every G ∈ D where I T s ds is a tempered distribution and is defined by I T s ds, φ = I T s , φ ds.
We can show that the law of Z t admits a density.
Then for each t > 0 the law of Z (x 0 ) t admits a density with respect to the volume element on S x 0 .
Proof. Note that the map x → δ x is smooth, moreover its (weak) derivative d x i δ x is given by for all φ. Therefore Ψ t (δ x ) is smooth and in particular p(x) := Ψ t (δ x ), 1 is smooth. It remains to show that p(x) is the density of the law of The reason why we consider only the case in which the manifolds of the partition are embeddings comes mostly from the need to use the Stroock and Varadhan support theorem: the closure appearing in the statement of such a theorem is intended in Euclidean sense. If the manifold topology was not the Euclidean topology we would have to consider two closures, the closure in the Euclidean topology and the closure in the manifold topology. This would make the exposition much more cloudy. Moreover we point out that in all our examples the manifolds at hand are embedded manifolds. It is possible that, under the assumption of this paper that the vector V (⊥) 0 is smooth and Lipshitz and that the integral curves of V (⊥) 0 are convergent, one may prove that the orbits S are indeed embedded manifolds. But this is beyond the scope of this paper.
A.2. Known facts about UFG semigroups. In this appendix we gather some known facts that we use frequently.
[F.1] A semigroup P t of bounded operators is Markov if where, in the above, 1 denotes the function identically equal to one. Denoting by · ∞ the supremum norm, the above implies that if f ∞ < ∞ then P t f ∞ ≤ f ∞ , i.e. the semigroup is a contraction in the supremum norm. Similarly the two parameter semigroups {Q s,t } 0≤s≤t and {Q s,t } 0≤s≤t , considered in Section 6 and Section 7, are both contractive in the supremum norm. [F.2] Note that if the vector fields V 0 , V 1 . . . , V d satisfy the parabolic Hörmander condition then for any f ∈ C b (R N ), the semigroup P t f (x) is smooth in all directions in R N and moreover it is smooth in t. This is not generally the case if we assume the UFG condition. However we have that for any f ∈ C b (R N ) and t > 0 the function for any α ∈ A. Moreover for any compact set K, t > 0 there exists If the vector fields V [α] are bounded then the above estimate holds uniformly on R N , for details see [42,Chapter 3]. In contrast to the case in which the parabolic Hörmander condition is enforced, when the UFG condition holds P t f need not be differentiable in the direction V 0 ; however it is differentiable in the direction ∂ t − V 0 . For more details see [12,Appendix A].
has been defined in Section 2) we have that (x, t) → P t f is smooth in both x and t, i.e. it is differentiable arbitrarily many times in every direction, see [8]. When f ∈ C b (R N ) we may take a sequence f n ∈ C ∞ V (R N ) such that P t f n ∈ C ∞ V (R N ) and for each compact set K ⊆ R N we have that P t f n and V [α 1 ] . . . V [α k ] P t f n converge uniformly over K as n tends to ∞ to P t f and V [α] . . . V [α k ] P t f respectively for each k ∈ N, α 1 , . . . , α k ∈ A. We shall denote by D 2,∞ V (R N ) the space of all functions that can be approximated with the procedure just described. From what we have just said, the semigroup P t f belongs to D 2,∞ V (R N ) for any f ∈ C b . See [12, Appendix A] for more details.
Lemma A.2. Let X and Y be as in Example 3.8. Then the vector fields {X, Y } do not satisfy the UFG condition, in the sense that whether we take X = V 0 and Y = V 1 or viceversa, the UFG condition is not satisfied.
Proof of Lemma A.2. In the definition of UFG condition take Y = V 0 and X = V 1 (the other case is simple to show) and assume that the UFG condition holds for some m ∈ N. Denote by ad X the map which takes a vector field Z to [X, Z], then note that Here ψ (k) denotes the k th derivative of ψ. Now (ad X ) k Y commutes with the vector field Y and hence the only non-trivial vector fields in R m are X, Y and (ad X ) k Y for any k ∈ N. By the UFG condition there exist smooth functions ϕ X , ϕ Y,k such that We may write this as follows using (A.1) By considering the direction ∂ x we have that ϕ X = 0, therefore we have Also note that since ψ(x) = 0 for all x < 0 we have that ψ (k) (x) = 0 for all x < 0 and k ∈ N; as ψ is smooth this gives that ψ (k) (0) = 0 for all k ∈ N. In particular, ψ solves the following initial value problem However since ψ is smooth and the functions {ϕ Y,k } k≥0 are smooth, there is a (at east locally) unique solution to this initial value problem; the function which is constantly zero clearly satisfies the initial value problem. Therefore we have that ψ ≡ 0 (in a neighbourhood of zero), which gives a contradiction and hence the UFG condition is not satisfied. [α] = 0 for all α ∈ A, and hence Φ n+1 x 0 (y) =p n+1 (T ) =p n+1 (0) = Φ n+1 x 0 (x). Now assume that Φ n+1 x 0 (x) = Φ n+1 x 0 (y). Letγ be any smooth curve that is contained in (Φ x 0 (U x 0 )) ∩ (R n × {Φ n+1 x 0 (x)}), and letγ(0) =ṽ. Define γ = Φ −1 x 0 (γ) and v =γ(0). Now we havė Sinceγ is contained within R n × {Φ n+1 x 0 (x)} we have thatṽ ∈ R n × {0} and hence v ∈∆(γ(0)). Therefore the tangent space to Φ −1 where S x is the maximal integral submanifold of∆ which passes through x. In particular, we have that y ∈ S x as required.
Lemma A.4. Assume the vector fields V 0 , . . ., V d satisfy the UFG condition. Let x, y ∈ R N be connected by an integral curve of one of the vector fields V [α] , α ∈ A m ; that is, y = e T V [α] (x) for some T > 0 and α ∈ A m . Then, for all h ∈ D 2,∞ V (R N ), 23 we have h)(γ(s))ds. 23 We recall that the set D 2,∞ V (R N ) has been introduced in Appendix A. Integrating from 0 to T , and using that γ(0) = x and γ(T ) = y, the statement follows.
Lemma A.5. With notation of Section 6, suppose Hypothesis 6.1 [H.1] holds. For any g ∈ C b (R n ) define the functions f (z, ζ) = g(z) and v s (z, t) : Then v s is smooth as a map from R n × (0, ∞) to R; moreover it satisfies Now using the equality V (see (9)), we have Note that, as differential operators, V (∆) 0 = U 0 and V i = U i therefore we have that v s satisfies (A.2). To extend the proof to the case when f belongs to C b (R n+1 ) we apply the argument of [12, Appendix A], so we only sketch this part of the proof. By Appendix A.
] P t f for any k ≥ 1, and α 1 , . . . , α k ∈ A m . By the above argument we have For any h > 0 we have therefore, letting n tend to ∞, we obtain Letting now h tend to 0 we have that (P t f )(z, ζ s−t ) is differentiable with respect to t and moreover That is, v s is differentiable in both z and t as a map from R n × (0, ∞) to R and satisfies (A.2).
Lemma A.6. With the notation of Section 7, if the map W ∞ is well defined on S x 0 (in the sense that S x 0 ⊆ Dom(W ∞ )) and it is continuous when restricted to S x 0 , then W ∞ is also well defined and continuous on S x 0 .
Proof of Lemma A.6. First note that given any point x ∈ S x 0 we can find some s ∈ R and z ∈ S x 0 such that x = e sV (⊥) 0 (z), in which case we have Now W ∞ (z) is well defined by assumption and hence W ∞ (x) is well-defined.
To show that W ∞ is continuous on S x 0 take {x k } k ⊆ S x 0 and x ∈ S x 0 such that x k → x as k tends to ∞, then we must show that W ∞ (x k ) converges to W ∞ (x) as k tends to ∞. Let x k = e s k V (⊥) 0 (z k ) and x = e sV (⊥) 0 (z) for some s k , s ∈ R and z k , z ∈ S x 0 . Without loss of generality we may assume that s = 0, otherwise consider the sequence y k := e −sV (⊥) 0 (x k ). Recall from Section 4.2 that we may take a local neighbourhood U x of x and a coordinate transformation Φ. Then for k sufficiently large we have that x k ∈ U x , and hence Φ(x k ) converges to Φ(x). By the uniqueness of integral curves we have that Φ(e sV (⊥) 0 (y)) = e sṼ 0 (⊥) (Φ(y)). only acts on the last coordinate we have that the first n components of Φ(e s k V (⊥) 0 (z k )) are equal to the first n components of Φ(z k ). In particular, the first n components of Φ(z k ) converge to the first n components of Φ(z). Now z k and z lie on the same integral submanifold of∆ and hence by Lemma A.3 the last component of Φ(z k ) is equal to the last coordinate of Φ(z). Therefore Φ(z k ) converges to Φ(z), and since Φ is a diffeomorphism we have that z k converges to z. Now since W ∞ is continuous on S x 0 and using (A.3) we have

Recall thatṼ
Therefore W ∞ is continuous on S x 0 .

Appendix B. Proofs
This appendix contains all the proofs that we omitted in the main text.
Proof of Lemma 3.5. In [12], the authors proved estimates of the type (21) for first order derivatives of the semigroup P t . More precisely, they show that if (18) is satisfied, then (19) holds. This statement is the analogous of such results for second order derivatives and can be proved with the same procedures presented in [12]. Notice indeed that in [ As we have already observed, the last N − n rows of the Jacobian matrix JΦ are orthogonal to vectors in ∆, see (27). Since V ∈ ∆, the statement follows.
This concludes the proof.
Proof of Lemma 4.15. Since S ⊆ S we have that S ⊆ S , therefore it is sufficient to show that if x ∈ ∂S then x / ∈ S . Assume for a contradiction there exists some x ∈ ∂S ∩ S . Since x ∈ S there exists a neighbourhood U ⊆ S which contains x and on which the coordinate transformation Φ constructed at the beginning of Section 4.2 is well defined. Now x ∈ ∂S implies there exists a sequence {x k } ⊆ S such that x k converges to x. For k sufficiently large x k belongs to U and, since the coordinate transformation is smooth, we have having used the notation 4.2. However x k all belong to the same maximal integral submanifold of ∆ and hence Φ n+1 (x k ) is constant for k large enough, by Lemma A.3. However this implies, for k large enough, that Φ n+1 (x) = Φ n+1 (x k ); so by (B.3) and Lemma A.3 we have that x and x k lie in the same maximal integral submanifold of∆. However this gives a contradiction, since x k ∈ S and x / ∈ S.
Proof of Lemma 4.16. We will prove that the set of points K := {x ∈ S x 0 : V (⊥) 0 (x 0 ) = 0} ⊆ S x 0 is both open and closed in (the topology of) S x 0 , hence it has to be the whole manifold S x 0 -see [SA.2] and Appendix A.1 for clarifications on the manifold topology. Such a set is clearly closed (in R N and hence in the manifold topology) as it is the intersection between S x 0 and the preimage of 0 through a continuous function. To prove that it is also open, we will show that for any x ∈ K there exists an open neighbourhood of x, O x , which is contained in K. Let x ∈ R N such that V (⊥) 0 (x) = 0 and let n = n(x) be the rank of∆ 0 at x; then there exist n vectors in∆ 0 (x) which span∆ 0 at x. Notice that, by construction, such vectors must belong to∆(x), as by Lemma 4 0 (x) = 0. By the smoothness of the vector fields and because x is a regular point for both distributions, there exists a neighbourhood O x of x such that the same n vectors span∆ 0 (y) for every y ∈ O x . Because V (⊥) 0 (y) is orthogonal to all the vectors in∆ 0 (y)(=∆(y)), it must be the case that V (⊥) 0 (y) = 0 on O x (otherwise the rank of∆ 0 would increase, which is impossible as the rank stays constant on the orbits). Therefore O x ⊆ K and the proof is concluded.
Proof of Proposition 5.1. We emphasize that this proof heavily relies on the fact that the integral manifolds of∆ 0 coincide with the orbits of∆ 0 , see Proposition 4.3.
• Proof of i). Let S be one of the integral manifolds of∆ 0 and suppose x ∈ ∂S . To begin with, we show that S x ⊆S . To this end, let y be any point in S x . We want to show that y ∈S . By Proposition 4.3 the integral manifold S x is given by the orbit through x of the vector fields in ∆ 0 , and hence y can be written as the end point of a curve which starts from x and is a piecewise integral curve for vector fields in∆ 0 . By considering each piece of the integral curve separately, if needed, we may assume that y = e T V (x) for some T > 0 and V ∈∆ 0 . Since x ∈ S , there is a sequence {x k } k converging to x and such that {x k } k ⊆ S . Set y k := e T V (x k ) and note that {y k } k belongs to S since S is an orbit of∆ 0 . We have that y k converges to y since the map z → e T V (z) is continuous. Therefore y ∈ S which implies that S x ⊆ S as y is an arbitrary point in S x . However S and S x are both maximal integral submanifolds so they are either disjoint or they coincide; since x ∈ S x and x / ∈ S they must be disjoint, hence S x ⊆ ∂S . • Proof of ii). Note that by the Stroock and Varadhan Theorem, Theorem 4.4, we have P x (X t ∈ S x ) = 1 for any x ∈ R N . From the reasoning in the proof of point i), we know that if x ∈ ∂S then S x ⊆ ∂S , so that S x ⊆ ∂S . Therefore for any x ∈ ∂S we have P x (X t ∈ ∂S ) ≥ P x (X t ∈ S x ) = 1.
Proof of Proposition 5.3. Here we consider the case V for all t, the result then follows by the Stroock and Varadhan Support Theorem. 24 Let us now define the set . Note that C is non-empty since 0 ∈ C; if we can show that C is open and closed as a subset of R then we must have that C = R which implies the desired result. Let us start by showing that C is open in R. To this end, fix an arbitrary point t 0 ∈ C; without loss of generality we may assume that t 0 = 0 (otherwise we consider the path q(t) := p(t + t 0 )). We will show that there exists an open neigbourhood of 0 which is contained in C. To show this fact we will make use of 24 Note that by Theorem 4.4 the path {X (x 0 ) t (ω)} t∈[0,T ] is a limit, in C([0, T ], · ∞), of solutions to the control problem (24). Because uniform convergence implies pointwise convergence, for each fixed t ≥ 0 the point Xt is a limit of {p(t) : p is a solution to (24)}. the (local) change of coordinates defined in Section 4.2. Let Φ x 0 : U x 0 →Ũ x 0 be the coordinate then by the same argument as above, we have that p(t 0 ) ∈ S e (t 0 −t)V (⊥) 0 (p(t 0 )) . Now by Lemma 3.14 applied to the points p(t) and e tV (⊥) 0 (x 0 ) and to the vector field V . Therefore p(t 0 ) ∈ S e t 0 V (⊥) 0 and we have a contradiction since t 0 / ∈ C therefore there is no t ∈ That is, C is closed in R and we have C = R as required.
Proof of Proposition 5.7. For every x ∈ R N and t > 0, let g t (x) : Suppose the SDE (1) admit an invariant measure, µ. Because E = ∪ t>0 E t , if we prove that µ(E t ) = 0 for every t ≥ 0, then it follows that µ(E) = 0 (as {E t } t≥0 is an increasing sequence of sets). So we concentrate on proving the first statement. To this end, define S to be the union of all the maximal integral submanifolds of∆ 0 of dimension and notice that ∪ N =0 S = R N ; moreover, for every (arbitrary but fixed) t > 0, set E t := {x ∈ S : P x (X x t / ∈ S ) > 0}. We now proceed in two steps.
It remains to show that if x ∈ E t then there exists some such that x ∈ E t . Fix x ∈ E t and let denote the dimension of S x . By the Stroock and Varadhan Support Theorem (Theorem 4.4) we have that P x (X x t ∈ S x ) = 1 for every t ≥ 0, hence By Proposition 5.1 we have that ∂S x is contained in the set ∪ k< S k . In particular, we have that ∂S x is disjoint from S and hence Since x ∈ E t we have that g t (x) > 0 and therefore P x (X x t / ∈ S ) > 0, which, by definition, gives that x ∈ E t . • Step 2: show that µ(E t ) = 0 for all ∈ {0, . . . , N }. To this end, set g t (x) := P x (X x t / ∈ S ); then the set of x ∈ S such that g t (x) > 0 is the set E t . Therefore it is sufficient to show that S g t (x)µ(dx) = 0 for all ∈ {0, . . ., N }. Assume this is not the case; that is, assume there exists some¯ such that We will let¯ be the maximum index such that (B.7) holds. Since µ is an invariant measure we have that Fix k ∈ {0, . . . , N } and first consider the case when k >¯ . Since¯ was chosen to be maximal such that (B.7) holds we must have This is equivalent to saying that the µ-measure of the set E k t = {x ∈ S k : P x (X x t / ∈ S k ) > 0} is zero. Since k =¯ , we have {x ∈ S k : P x (X x t ∈ S¯ ) > 0} ⊆ E k t , so the µ-measure of the set {x ∈ S k : P x (X x t ∈ S¯ ) > 0} is zero as well. Therefore Now consider the case k <¯ . In this case we havē Writing This contradicts (B.7) and hence we must have that the statement holds.
Proof of Lemma 5.9. Fix x, y ∈ S and f ∈ C b (R N ) and assume first that x, y are such that there exists a path γ : [0, T ] → R N with γ(0) = x, γ(T ) = y andγ(t) = V [α] (γ(t)), for some α ∈ A m . Clearly the final time T will depend on x and y, T = T x,y . By Lemma A.4 and by Appendix A.2 Take a compact set K such that K ⊇ γ([0, T x,y ]), then by (19) we have Letting t tend to ∞ we obtain the result. For any x, y ∈ S we can take a piecewise integral curve connecting x and y, hence applying the above argument to each piece of the curve we obtain (40).  (41). Because (41) holds for every x ∈ S, by the uniqueness of the limit we have that µ must be the only invariant measure supported on S.
It remains to show that µ is ergodic. Suppose there exists t > 0 and a Borel set E ⊆ R N such that P t 1 E = 1 E µ-almost everywhere. Then by the semigroup property we have for every n ∈ N P nt 1 E = 1 E µ-almost everywhere. Now squaring and integrating (41) with respect to µ we have that P t f converges to S f dµ in L 2 µ for each f ∈ C b (R N ). Then since C b (R N ) is dense in L 2 µ (see [52,Theorem 3.14]), for all f ∈ L 2 µ we have By taking a subsequence if necessary we get convergence µ-almost everywhere. In particular, taking admits an evolution system of measures {ν t } t≥0 ; then, for each g ∈ C b (R n ), z ∈ R n , and s ≥ 0 we have Proof of Lemma B.1. Fix g ∈ C b (R n ) and let f ∈ C b (R n+1 ) be a function that doesn't depend on the last variable and such that f (z, η) = g(z) for every η ∈ R, z ∈ R n ; note that by (47) we have Now for every fixed z, y ∈ R n we can write By Note 6.2 we have that the hyperplane S := {x = (z, ζ) ∈ R n+1 : ζ = ζ s } is the orbit of the vector fields V [α] , α ∈ A m . Since (z, ζ s ) and (y, ζ s ) belong to S we may take a piecewise integral curve connecting them. Without loss of generality we may take an integral curve γ : [0, T ] → R n+1 connecting (z, ζ s ) and (y, ζ s ), withγ t = V [α] (γ t ). Clearly the time T will depend on z and y, i.e. T = T z,y . Let K be a compact set such that γ([0, T z,y ]) ⊆ K; by Lemma A.4 applied to the function Because we let t → ∞, we can restrict to the case t > s. So fix s 0 > 0 such that t − s > s 0 ; by (19) we then have Letting t tend to ∞ and using (B.13) we obtain lim t→∞ |Q s,t g(z) − Q s,t g(y)| = 0. (B.14) The proof can now be concluded as follows: because {ν t } t is an evolution system of measures (see 53), we can write ≤ R n |Q s,t g(z) − Q s,t g(y)| ν s (dy).
Using (B.14) and the dominated convergence theorem (which is applicable by Appendix A.3 [F.1]) we may take the limit as t tends to ∞ and obtain (B.12).
In order to prove Lemma B.4, which is the core of the last step of the proof of Theorem 6.5, we must first prove the following two results, Lemma B.2 and Lemma B.3.
Lemma B.2. Assume Hypothesis 6.1 holds. Then, for each g ∈ C b (R n ) and z ∈ R n we have Q s−t,s g(z) →Q t g(z) as s tends to ∞ uniformly on compacts of R n × (0, ∞). That is, for every fixed T > 0 and r > 0 we have lim Q s−t,s g(z) −Q t g(z) = 0.
Proof of Lemma B.2. Fix g ∈ C b (R n ) and consider v s (z, t) = (Q ζ s−t,s g)(z), z ∈ R n , t > 0.
Like in the Proof of Lemma B.1, define f ∈ C b (R n+1 ) by f (z, η) = g(z) for all z ∈ R n and η ∈ R; then by (B.13) we have v s (z, t) = P t f (z, ζ s−t ). (B.15) Since (P t f )(x) is a continuous function, we may take the limit as s tends to infinity to obtain lim s→∞ v s = P t f (z,ζ) =Q t g(z) =: v(z, t).
We now wish to show that the above limit is uniform on compact subsets of R n × (0, ∞); that is, we wish to show that To show this fact we shall use the Ascoli-Arzela Theorem. Indeed, assuming for the moment that we can apply such a theorem, then we can find a subsequence s k such that v s k converges uniformly on B R × [1/T, T ]. Since v s k converges pointwise to v we have that the limit is independent of the choice of sequence hence v s converges uniformly in B R × [1/T, T ] to v, as s tends to ∞. So, if we show that the derivatives of v s are bounded on B R × [ 1 T , T ], uniformly in s, then we may apply Arzela-Ascoli Theorem and the proof is concluded by the above line of reasoning. By Lemma A.5 the function v s is smooth in (z, t) ∈ R n × R and satisfies (A.2).
For any point (z, t) ∈ R n × (0, ∞) there exist an open neighbourhood of (z, t) and smooth functions ϕ i,α such that, for any i ∈ {1, . . ., n}, the derivative ∂ i ≡ ∂ z i can be expressed as Here we have used that ζ t is convergent and hence B R × {ζ t : t ≥ −T } is a compact subset of R n+1 . Similarly we may bound the second order derivatives V 2 i v s , and using (A.2) we obtain a bound for the derivative with respect to t, which is independent of s.
Using the tightness of the family {ν t } t≥0 , there exists a divergent sequence t such that ν t −k converges weakly to some measure µ k as tends to ∞, for each k ∈ N. (We emphasise that, by a diagonal argument, the sequence t can be chosen to be independent of k). Moreover, {µ k } k∈N is tight since {ν t } t≥0 is tight (see [1, Step 2 in the proof of Theorem 6.2]). Lemma B.3. Assume Hypothesis 6.1 holds and construct {µ k } k∈N as above. Then, R nQ k g(z)µ k (dz) = g(z)µ 0 (dz), for any g ∈ C b (R n ) and every k ∈ N.
Proof of Lemma B.3. We will consider the integral R n Q t −k,t g(z)ν t −k (dz) and show the following: R n g(z)µ 0 (dz) = lim →∞ R n Q t −k,t g(z)ν t −k (dz) = R nQ k g(z)µ k (dz) , for every k ∈ N. (B.16) Let us start with showing the first equality in (B.16). Because {ν t } t≥0 is an evolution system of measures (and taking sufficiently large that t > k), we have R n Q t −k,t g(z)ν t −k (dz) = g(z)ν t (dz).
The above, combined with the fact that ν t converges weakly to µ 0 , gives the first identity in (B.16).
To prove the second equality in (B.16), observe the following: Now I 2, converges to 0 as → ∞ since ν t −k converges weakly to µ k , by definition of µ k . To see that I 1, vanishes when tends to ∞ fix ε > 0 and take a ball B r such that ν t −k (B r ) ≥ 1 − ε for all with t > k. This is possible since the family {ν t : t ≥ 0} is tight. By Lemma B.2 we know that Q t −k,t g(z) converges uniformly on compacts toQ k g(z); hence, if is sufficiently large, we have sup z∈Br |Q t −k,t g(z) −Q k g(z)| ≤ ε.
As ε is arbitrary we have that I 1, converges to 0 as tends to ∞, and the claim follows.
Lemma B.4. Assume Hypothesis 6.1 holds and, as described before the statement of Lemma B.3, let µ 0 be the weak limit of the sequence ν t . Then µ 0 =μ.
Proof of Lemma B.4. Take g ∈ C b (R n ). By Lemma 6.4 we know thatQ k g(z) → µ(g) as k tends to ∞ for each z ∈ R n and g ∈ C b (R n ). By an argument analogous to the one used in the proof of Lemma B.2 we have thatQ k g(z) converges to µ(g) locally uniformly for z ∈ R n . Now fix ε > 0; since {µ k } k is a tight sequence, we may take B r ⊆ R n such that µ k (B r ) ≥ 1 − ε for all t ≥ 0. Moreover, for k sufficiently large we have sup x∈Br |Q k g(z) −μ(g)| ≤ ε.
However by Lemma B.3 we also have R n g(z)µ 0 (dz) = Therefore if we show that the functionsφ(x, t) := ϕ α,β (e tV (⊥) 0 (x)) are smooth, bounded and belong to the sets C ∞ V (R N × R) then we have that the vector fields {∂ t + V [0],t , V [1],t , . . . , V [d],t } satisfy the UFG condition when viewed as vector fields in both the time variable t and spatial variables z. Since ϕ α,β is smooth and bounded, and V (⊥) 0 is smooth we have that ϕ α,β • e tV (⊥) 0 is smooth and bounded, it remains to show that for any k ∈ N and γ 1 , . . . , γ k ∈ A we have Therefore the UFG condition is satisfied.
Proof of Proposition 7.3. For f ∈ C ∞ V (R N ) we have that P t f is smooth (in every direction, see Now by a density argument analogous to the one in the proof of Lemma A.5 we obtain the result for f ∈ C b (R N ). To prove the result for g ∈ C b (S x 0 ) we may apply the Tietze Extension Theorem, see [19,Chapter 2 Theorem 5.4], to extend g to a function f ∈ C b (R N ) such that f = g on S x 0 (this is where we need g to be continuous up to and including the closure of S x 0 , as functions that are continuous on open sets don't necessarily admit a continuous extension to the whole R N , i.e. Tietze Extension Theorem would not apply). Since Z t takes values in S x 0 for every t ≥ 0 we have that Q s,t g(z) = Q s,t f (z) for any z ∈ S x 0 , hence the claim follows.
Proof of Proposition 7.6. By Hypothesis 7.4 [A.3], the family of measures {p x t } t≥0 is tight and hence, by Prokhorov's Theorem, there exist a measure µ S and a diverging sequence {t k } k such that p x t k converges weakly to µ S as t k ∞. Note that in general the sequence t k and the measure µ S may depend on the choice of x ∈ S x 0 ; however, by Lemma 5.9, p x t k (·) = (P t 1 {·} )(x) converges weakly to µ S for any choice of x ∈ S. We now show that such a convergence is also independent of the choice of divergent sequence. Let s k be a sequence such that s k ∞ and fix f ∈ C b (R N ) and x ∈ S; then By (19) (and (21)) there exists a constant C = C(t 0 , x) > 0 such that for all t > t 0 we have Using that V (⊥) 0 = 0 we have that there exists a constant C = C(t 0 , x) > 0 such that Therefore Letting k tend to ∞ we have that |P t k f (x) − P s k f (x)| vanishes in the limit and hence P s k f (x) converges to µ S (f ). Therefore P t f (x) converges to µ S (f ) as t tends to ∞.
To show that µ S is an invariant measure take an arbitrary s > 0 and f ∈ C b (R N ); then µ S (P s (f )) = lim t→∞ P t P s f (x) = lim t→∞ P t+s f (x) = lim t→∞ P t f (x) = µ S (f ), for every s ≥ 0 .
Hence µ S is an invariant measure. To show that the convergence is uniform on compact subsets of S we apply Arzela-Ascoli. Indeed fix a compact set K ⊆ S then it is sufficient to show that P t f (x) has bounded derivatives uniformly in t on K. However x → P t f (x) is differentiable in the directions V [α] for all α ∈ A which span the tangent space of S and, by the Obtuse Angle Condition, Assumption [A.5], we have for all t > t 0 that (19) holds. Hence we have that P t f (x) converges to µ S (f ) uniformly on compact subsets of S. Note that since (65) holds for all f ∈ C b (R N ), there is at most one measure satisfying (65).
We now move on to prove Lemma B.7 and Lemma B.8, which are the backbone of the proof of Theorem 7.8). Throughout this section, for any f ∈ C b (S x ), we let (B.23) In order to prove Lemma B.7 we first state and prove the following two results. uniformly for s ∈ [1/T, T ] and z ∈ K, whenever lim τ →∞ e τ V (⊥) 0 (z) exists for all z ∈ K.
Proof of Lemma B.5. The proof is analogous to the proof of Lemma B.2, so we only sketch it and point out the main differences. Note that P tf is continuous, using (61) we have that (B.24) holds pointwise. To obtain convergence uniform on compact subsets of S x 0 × (0, ∞), we use Arzela-Ascoli and the following estimate. x is convergent and the map W ∞ is assumed continuous.
Define µ k to be the probability measure such that ν t −k converges weakly to µ k , this measure is constructed analogously to the comment above Lemma B.3.
Lemma B.6. Let Hypothesis 7.4 [A.1] and [A.7] hold, and assume the semigroup {Q s,t } s≤t admits a tight evolution system of measures {ν t } 0≤t supported on S x 0 . Then for any f ∈ C b (S x ) andf defined as in (B.23).
Proof of Lemma B.6. This proof is completely analogous to the proof of Lemma B.3, so we only point out the main differences. It suffices to prove the following two expressions (z)ν t (dz).
Since ν t converges weakly to µ 0 and W ∞ is a continuous map from S x 0 to R N , by the continuous mapping theorem we have that ν t • (W ∞ ) −1 converges weakly to µ 0 • (W ∞ ) −1 and hence we obtain (B.25). To prove the second equality in (B.25) like in the proof of Lemma B.3 we write Q t −k,t f (z) − P kf (W ∞ (z)) ν t −k (dz) Observe that on the image of W ∞ we have V (⊥) 0 = 0, by Hypothesis 7.4 [A.7], and hence P kf (W ∞ (z)) = P k f (W ∞ (z)) therefore we can rewrite I 2, as