The cutoff phenomenon in total variation for nonlinear Langevin systems with small layered stable noise

This paper provides an extended case study of the cutoff phenomenon for a prototypical class of nonlinear Langevin systems with a single stable state perturbed by an additive pure jump L\'evy noise of small amplitude $\varepsilon>0$, where the driving noise process is of layered stable type. Under a drift coercivity condition the associated family of processes $X^\varepsilon$ turns out to be exponentially ergodic with equilibrium distribution $\mu^{\varepsilon}$ in total variation distance which extends a result from Peng and Zhang (2018) to arbitrary polynomial moments. The main results establish the cutoff phenomenon with respect to the total variation, under a sufficient smoothing condition of Blumenthal-Getoor index $\alpha>3/2$. That is to say, in this setting we identify a deterministic time scale $\mathfrak{t}_{\varepsilon}^{\mathrm{cut}}$ satisfying $\mathfrak{t}_ \varepsilon^{\mathrm{cut}} \rightarrow \infty$, as $\varepsilon \rightarrow 0$, and a respective time window, $\mathfrak{t}_\varepsilon^{\mathrm{cut}} \pm o(\mathfrak{t}_\varepsilon^{\mathrm{cut}})$, during which the total variation distance between the current state and its equilibrium $\mu^{\varepsilon}$ essentially collapses as $\varepsilon$ tends to zero. In addition, we extend the dynamical characterization under which the latter phenomenon can be described by the convergence of such distance to a unique profile function first established in Barrera and Jara (2020) to the L\'evy case for nonlinear drift. This leads to sufficient conditions, which can be verified in examples, such as gradient systems subject to small symmetric $\alpha$-stable noise for $\alpha>3/2$. The proof techniques differ completely from the Gaussian case due to the absence of respective Girsanov transforms which couple the nonlinear equation and the linear approximation asymptotically even for short times.


Introduction.
Roughly speaking the term cutoff phenomenon with respect to a distance d 1 refers to the following asymptotic dynamics: consider a parametrized family of stochastic processes (X ε ) ε>0 , X ε = (X ε t ) t 0 , such that for each ε > 0, the process X ε has a unique limiting distribution µ ε . Then -as ε decreases to 0 -the function t → d ε (X ε t , µ ε ) given as a suitably renormalized distance d ε (of d 1 ) between the law of X ε t and the corresponding limiting distribution µ ε essentially resembles the step function t → diam · 1 [0,t cut ε ] (t). This function descends from the value diam ∈ (0, ∞], that is the diameter of d 1 of the distributions over the state space to the value 0, at a deterministic cutoff time scale t cut ε , which tends to ∞ as ε → 0. In other words, the function ε → t cut ε is such that for times smaller than t cut ε − w cut ε , where w cut ε = o(t cut ε ) and 1/w cut ε = o(1), the previous distance is asymptotically maximal and asymptotically zero for times larger than t cut ε + w cut ε . In certain situations, a proper limit can be taken, and the limiting function gives rise to a so-called cutoff profile function connecting the asymptotic values diam and 0 monotonically. This abrupt convergence phenomenon was first described by Aldous and Diaconis [2] in the early eighties to conceptualize the collapse of the total variation distance between Markov chain marginals related to card shuffling to its uniform limiting distribution. Since then, this behavior has been studied by numerous authors and in different -mainly discrete -settings. For instance we refer to Diaconis [23], Martínez and Ycart [45] and Levin et al. [44] for the Markov chain setting, Chen and Saloff-Coste [18] considered some ergodic Markov processes, Lachaud [41] and Barrera [5] for the case of the Ornstein-Uhlenbeck processes driven by a Brownian motion, to name but a few. Further standard texts on the cutoff phenomenon include [1,3,11,13,14,15,18,22,24,25,40,42,43,44,46,63,68] and the references therein.
This article provides a case study on the cutoff phenomenon in the (unnormalized) total variation distance for the strong solution process X ε of a class of stochastic differential equations with nonlinear coercive vector field −b with a non-degenerate stable state 0 subject to an additive pure jump Lévy process L at ε-small amplitude dX ε t = −b(X ε t )dt + εdL t for t 0, (1.1) Similar -and in some sense simpler -settings have been studied before: the case of nonlinear, coercive vector fields (−b) subject to Brownian perturbation L = W with respect to the total variation [8,9] and two cases of linear, asymptotically exponentially stable drifts −b = −Qthat is, eigenvalues have negative real part, but the matrix is not necessarily coercive, see [64] subject to pure-jump Lévy noises L [6,10] in the total variation and the Wasserstein distance, respectively. This paper yields the first results on the cutoff phenomenon for nonlinear coercive, pure-jump Lévy SDEs, which is fraught with technical difficulties, inheriting (a) regularity issues from the linear case [10] and (b) earning additional challenges due to the lack of available (shorttime) coupling results between Lévy SDEs with the nonlinear vector fields and its Ornstein-Uhlenbeck approximation.
The regularity issue (a) is overcome by the careful choice of the setting of a class of locally layered stable noise processes, by which we generalize the notion of layered stable processesintroduced by Houdré and Kawai [33] -and the equator condition inspired by [60]. Regularity results for densities of SDEs which turn out to be crucial for results in the total variation distance have been extensively studied for instance in [20,29,34,37,57].
The nonlinear coupling problem (b) is essentially reduced to the control of two partial errors of a different nature addressed in Proposition 3 and Proposition 2. The first error consists of the total variation distance between the short time linear inhomogeneous Ornstein-Uhlenbeck (Freidlin-Wentzell first order) approximation under linear and nonlinear initial conditions. A slight extension of Theorem 3.1 in [33] provides a stable local limit theorem on the shortrange behavior, which allows for an appropriate coupling in the proof of Proposition 3. The second error, which is dominated in the statement of Proposition 2 represents the crucial part of the proof of the main results. It directly compares the nonlinear process X ε with its linear inhomogeneous Ornstein-Uhlenbeck approximation for short times. While there are very recent short-time couplings for SDEs with different (nonlinear) drift under a Brownian driver (see Eberle and Zimmer [27]), to our knowledge the literature on respective pure jump counterparts is virtually nonexistent. In order to obtain short-time coupling between the linear and the nonlinear vector field, we use Plancherel's theorem, and appropriate differential inequalities for the characteristic function of a strongly localized version of X ε for Blumenthal-Getoor index α > 3 /2. To the best of our effort it seems hard to derive with this technique the correct (exponential) integrability of the tails of the characteristic function -even in the linear, scalar Gaussian case -and at the same time unclear how to relax this condition. The same sort of technical difficulties concerning Fourier approach arises in condition (a) p. 345 of [29].
The difficulty of the nonlinear case studied in this article can be informally understood as follows. In the linear case b(x) = −Qx, it is well-known that by the variation-of-constants formula X ε t can be written as the sum of the deterministic matrix exponential dynamics plus the respective stochastic convolution. Since the total variation distance is well-behaved under deterministic and mutually independent components, it can be dominated without too much effort in the linear case. This program was carried out in [10]. In the nonlinear, additive noise case X ε t can be written analogously, but it exhibits an additional error term. That is, X ε is given as the sum of the nonlinear deterministic dynamics, its stochastic (nonlinear) convolution with the noise and an additional random term representing the (implicit) nonlinear residual of the noise, which is neither deterministic nor independent from the noise convolution and therefore not easily dominated in total variation. Beyond that, the aforementioned random residual term turns out to be a challenge since there is no analogue of Slutsky's lemma for the total variation distance even in the case of smooth densities. A counterexample is given in subsubsection 1.3.5. On a more abstract level, the additional difficulties encountered are illustrated for the Wasserstein upper bounds of the total variation which require additional density gradient estimates (see Theorem 2.1 in [17]).
Our results cover the important examples of overdamped gradient systems, such as the Fermi-Pasta-Ulam-Tsingou potential, perturbed by pure jump Lévy processes with Blumenthal-Getoor index index α > 3 /2 in a strong sense, such as symmetric α-stable processes, symmetric tempered α-stable processes in Rosiński [52] and the symmetric Lamperti-α-stable process [16]. If -in addition -the limiting distributions turns out to be rotationally invariant, the existence of a cutoff profile is shown to be equivalent to a computational linear algebra eigenvector problem first established in [6] for the easier situation of the Wasserstein distance. This characterization is given as a specific orthogonality condition of the (generalized) eigenvectors of the linearization −Db(0) of −b in the stable state 0. It allows to carry over several results from the linear case under the Wasserstein distance in [6], to the case of a nonlinear vector field −b and the total variation distance. In physics terminology, our results can be restated that the existence of a cutoff profile is equivalent to the absence of non-normal growth effects in −Db(0) in the case of rotationally invariant limiting distributions in the nonlinear setting.
For a complete comparison of the different settings and results and in order to avoid a lengthy introduction, we refer to the following self-explanatory The nonlinear Wasserstein setting with results in the spirit of [6] are studied in the paper [7].
Along the text of the manuscript we prove several results which to our knowledge have not been present in the literature. (1) In Theorem 1 we generalize the strong ergodicity result of Theorem 4.1 in [50] from moments β 2 to any β > 0. The proof is given in Subsection D. (2) We introduce the class of locally layered stable process, which are precisely the class of processes for which the short-range behavior in Theorem 3.1 in [33] is true, despite a slightly stronger formulation of the authors in terms of the stronger notion of layered stable processes.
(3) In Proposition 7 we give an elementary proof of the local β-Hölder continuity of the characteristic exponents in case of β ∈ (0, 1]-moments in Subsection C.1. (4) We also provide a complete overview of the behavior of the estimates of matrix exponentials and related flows for an asymmetrix matrix in Appendix A.
The manuscript is organized in two large sections and an extended Appendix. The first section  contains the exposition of the setting, the main results of Theorem 2 and 3, the examples and  the concluding global steps of the proof of the main results, which reduces the proof to the  Proposition 1 -4. The proofs of the propositions 1, 2, 3 and 4 are given in the subsections 2.1 The Appendix is divided in the four sections A-D. Section A provides all necessary fine results on the derministic dynamics. Appendix B provides a quantitative estimates of the Freidlin-Wentzell first order approximation. Appendix C gives several auxiliary technical results, some of which we have not been aware in the literature, such as the local β-Hölder-continuity of a Lévy process in the presence of arbitrary β-moments. Appendix D yields the proof of Theorem 1, which implies the exponential ergodicity of X ε towards µ ε , which extends a result by [50] to the case of an arbitrary positive finite moment. Let b ∈ C 2 (R d , R d ) be a vector field with b(0) = 0 satisfying the following coercivity condition.

Hypothesis 1 (Coercivity).
Assume that there exists a positive constant δ such that where |·| and ·, · denote the Euclidean norm and the standard inner product on R d , respectively.
In this manuscript we are interested in the stochastically perturbed analogue of the dynamical system given as the global solution flow (ϕ · t ) t 0 of the ordinary differential equation It is well-known that Hypothesis 1 implies the well-posedness of (1.3). Furthermore, in our setting inequality (1.2) is equivalent to where Db(x) denotes the derivative of the vector field b at the point x.
As a consequence |ϕ x t | e −δt |x| for any t 0 and x ∈ R d , i.e. 0 is an asymptotically exponentially stable fixed point of (1.3). For our purposes, however, we need the precise description of the convergence to 0 in terms of the spectral decomposition of −Db(0). This is the purpose of the following lemma which characterizes the asymptotics of ϕ x t as t tends to ∞ and slightly refines the classical and well-known result by Hartman-Grobman [31,32] under Hypothesis 1. This lemma turns out to be crucial for the precise shape of the cutoff time and time window. Moreover, The proof of this result is given in Lemma B.2 of [9]. Remark 1.2. For x ∈ R d , x = 0, λ x corresponds to a real part of some eigenvalue of Db(0) and {v k , k = 1, . . . , m} are elements of the Jordan decomposition of Db(0) according to the flag of the eigenspaces (along increasing real parts of the corresponding eigenvalues) containing x. For any generic choice of x, λ x corresponds to the smallest real part of the eigenvalues of Db(0).

The stochastic perturbation εdL.
On a given probability space (Ω, F, P) consider a Lévy process L = (L t ) t 0 with values in R d , i.e. a stochastic process with càdlàg paths, independent and stationary increments and issued from 0. Its marginals are determined by the celebrated Lévy-Khintchin formula E e i u,Lt = e tΨ(u) for any u ∈ R d , with the characteristic exponent Let (F t ) t 0 be the enhanced natural filtration of L satisfying the usual conditions of Protter [51]. The stochastic analogue of the dynamical system (1.3) is described by the following stochastic differential equation. For ε > 0, we consider which under Hypothesis 1 has a unique strong solution X ε,x = (X ε,x t ) t 0 . Such strong solution satisfies the strong Markov property with respect to the filtration (F t ) t 0 , see for instance p. 1026 in [66] and the references therein.
1.2.3. Exponential ergodicity and regularity of the limiting distributions µ ε . a) Hypotheses on the Lévy measure: The existence of invariant measures is known to be true for systems with as little as logarithmic moments [35], however we need exponential ergodicity in the total variation distance, which typically needs some (arbitrarily low) finite moments and regularity of the transition kernel for the Lévy measure, see for instance [38]. Both requirements are met by the class of Lévy measures defined below.
In the light of the local limit theorem (see for instance Theorem 3.1 in [33] and Theorem 4.1 in [50]), it is not surprising that our results are shown for a specific class of Lévy processes. In what follows, we assume that the Lévy process L has no Gaussian component and its Lévy measure belongs to the following class. Let ν be a Lévy measure on (R d , B(R d )). Then ν is called a locally layered stable Lévy measure with parameters (ν 0 , ν ∞ , Λ, q, c 0 , α) if the following is satisfied. There exist σ-finite Borel measures ν 0 and ν ∞ such that ν = ν 0 + ν ∞ , where ν ∞ is a finite measure with support contained in {|z| > 1} and where Λ is a finite positive measure on S d−1 (the unit sphere on R d ), and q : (0, 1] × S d−1 → (0, ∞) is a locally integrable function for which there exist a positive function c 0 in L 1 (Λ) and a parameter α ∈ (0, 2) such that for Λ-almost all θ ∈ S d−1 . A pure jump Lévy process with a locally layered Lévy measure is called a locally layered Lévy process.
This notion generalizes naturally the notion of a layered stable Lévy measure (and the respective Lévy process) introduced in Definition 2.1 of [33] to all Lévy measures for which Theorem 3.1 (Short-range behavior) remains valid under Hypothesis 2. They include more general tail measures ν ∞ than layered stable Lévy measures given in [33], such as tempered stable Lévy measures defined in [52] and Lamperti stable Lévy measures [16]. The following more restrictive notion is tailor-made to strengthen the result of Theorem 3.1 in [33] to the convergence in the total variation distance which turns out to be crucial in the proof of Proposition 3. In addition, in Theorem 4 in Appendix D we extend Theorem 4.1 of [50] and show that under Hypothesis 2 the system (1.6) is strongly ergodic under the total variation distance.
In the sequel, we define sufficient conditions for an abrupt convergence of X ε,x t to its unique limiting distribution µ ε as ε → 0 in the total variation distance.
b) The total variation distance · TV : Before we introduce the concept of cutoff formally, we recall the notion of the total variation distance. Given two probability measures P and Q which are defined on the same measurable space (Ω, F), denote the total variation distance between P and Q as follows For simplicity, in the case of two random vectors X and Y defined on the same probability space (Ω, F, P) we use the following notation for its total variation distance where L(X) and L(Y ) denote the law under P of the random vectors X and Y , respectively. For the sake of intuitive reasoning and in a conscious abuse of notation we write X − µ Y TV instead of X − Y TV , where µ Y is the distribution of the random vector Y . For a complete understanding of the total variation distance, we refer to Chapter 2 of the monograph of Kulik [39] and the references therein. c) Exponential ergodicity with smooth limiting measure.
As we mentioned before, we are interested on the cutoff under the total variation distance, which is a rather robust distance for continuous distributions and rather sensitive for discrete distributions. It is therefore natural to assume the following additional hypothesis which with the help of Hypothesis 3 yields smooth densities for the finite time marginals and the limiting distribution of (1.6).
Hypothesis 4 (Equator condition [60]). Let ν satisfy Hypothesis 3. The support of the measure Λ is not contained in any proper subspace of R d intersected with S d−1 . Furthermore, we assume where the essential infimum is understood with respect to the spectral measure Λ of ν.
The equator condition is motivated by the definition given in Simon [60], p.4. It provides a non-degeneracy condition on the support of Λ on S d−1 .
Remark 1.6. It is not hard to see that Hypothesis 4 (1.10) implies The following lemma links Definition 1.4 and Hypothesis 4 to the celebrated Orey-Masuda regularity condition, which is used in the proof of Proposition 2.
The following result is a slight generalization of Theorem 4.1 in [50] and guarantees that under Hypotheses 1, 2, 3 and 4 the system (1.6) is strongly ergodic under the total variation distance.
We remark that in dimension d = 1, a classical result by Kulik (see Proposition 0.1 in [38]) imply that the solution of (1.6) enjoys exponential ergodicity without assumption (1.9) and implicitly Theorem 1 holds for general locally layered stable Lévy measures in this case. For higher dimensions, we use the sufficient conditions including (1.9) in [50] and our generalizations of their results given in Appendix D. We point out that for the special case of symmetric α-stable Lévy processes, assumption (1.9) is automatically satisfied and [65] yields exponential ergodicity in any dimension. Theorem 1. Assume Hypotheses 1, 2, 3 and 4 for α ∈ (0, 2) and β > 0. Then for any ε > 0, there exists a unique invariant distribution µ ε and positive constants C ε , θ ε such that for all x ∈ R d , the law of the unique strong solution X ε,x of (1.6) satisfies for any t ≥ 0. The proof is a direct corollary of Theorem 4 given in Appendix D. The tracking of the dependence ε → (θ ε , C ε ) is typically hard to follow through the discretization procedure laid out by Meyn and Tweedie [47]. In the special case of finite variation, the backtracking of ε can be carried out partially, we refer to [38]. According to [12] and the references therein, there are three notions of cutoff phenomenon with increasing strength. The most restrictive notion is called profile cutoff which provides the precise asymptotic shape of the collapse for the total variation distance. Profile cutoff implies a weaker concept which is called window cutoff that states abrupt convergence within a precise time interval but losing the precise profile. Window cutoff is generalized further to the notion of cutoff in which we retain the abrupt convergence along time scale which corresponds to the center of the interval, however, without a quantification of the error. Definition 1.8. For any ε > 0 and x ∈ R d , let X ε,x be the solution of (1.6) with a unique limiting distribution µ ε . We say that for x ∈ R d the family (X ε,x ) ε∈(0,1] exhibits a) a cutoff at the time scale (t x ε ) ε∈(0,1] , where t x ε → ∞, as ε → 0, if it satisfies The cut-off time scale t x ε is sometimes referred to as the center of the cutoff window and w x ε as its width. As mentioned above iii) implies ii) and ii) implies i).
The first main result of this study reads as follows.
Remark 1.9. For x = 0, there is no cutoff phenomenon since the linearization vanishes and intuitively cannot compete with the ergodicity. For details see Remark 2.1. For a complete discussion of the easier case of the Wasserstein distance, we refer to Section 3.2 in [6].
Assume the hypotheses of Theorem 2 to be satisfied for some . . , θ m x and v 1 x , . . . , v m x given in Lemma 1.1. We define the ω-limit set for the dynamics of (v(t, x)) t 0 by which due to the left-hand side of (1.5) does not include the null vector, i.e. 0 ∈ ω(x).
Remark 1.10. Note that ω(x) = ∅. Indeed, a Cantor diagonal argument for any limiting sequence in (1.4) yields the existence of a subsequence (t j ) j∈N with t j → ∞, as j → ∞, such that for any k = 1, . . . , m the limit lim j→∞ e it j θ k = ϑ k exist. Moreover, |ϑ k | = 1 for all k. Since . In an abuse of notation let Z ∞ denote a parametrization of the unique invariant distribution of the Ornstein-Uhlenbeck process We have the following characterization of profile cutoff.

Theorem 3 (Dynamical cutoff profile characterization).
Assume the hypotheses of Theorem 2 to be satisfied for some x ∈ R d \ {0}. Recall the ω-limit set ω(x) given in (1.13). Then the family (X ε,x ) ε∈(0,1] exhibits a profile cutoff as ε → 0 at the enhanced time scale (t x ε , w x ε ) given by Theorem 2 with profile function if and only if for any a > 0 the map Observe that ω(x) = {v x } immediately implies profile cutoff by the preceding theorem. The latter, indeed, is satisfied in the subsequent case of a gradient potential.
Corollary 1.11. Let the assumptions of Theorem 2 be satisfied and assume b( Then the family (X ε,x ) ε∈(0,1] exhibits a profile cutoff as ε → 0 at the enhanced time scale (t x ε , w x ε ) given by where λ x > 0 and τ x are the positive constants in the Hartman-Grobman decomposition of Lemma 1.1 such that and the profile function is given by Remark 1.12. Note that the dependence of λ x of x can be complicated, however, it is rather weak in the following qualitative sense: λ x = λ for Lebesgue almost every x ∈ R d , where λ is the smallest eigenvalue of the positive definite symmetric matrix D 2 V(0). Corollary 1.13. Assume the hypotheses of Theorem 2 to be satisfied for some If there exists a d-square invertible matrix A such that the distribution of AZ ∞ is rotationally invariant and the image set satisfies Aω(x) ⊂ {|z| = r} for some r = r x > 0, then the family (X ε,x ) ε∈(0,1] exhibits a profile cutoff as ε → 0 at the enhanced time scale (t x ε , w x ε ). Remark 1.14. For the non-degenerate Gaussian case we refer to Lemma A.2 in [8]. There, the law of Z ∞ is N (0, Σ), where Σ satisfies is rotationally invariant. Hence the sphere condition Aω(x) ⊂ {|z| = r} for some r = r x > 0 is equivalent to the profile cutoff, see Corollary 2.11 in [8]. However, in the generic Lévy case, no symmetry on the law of Z ∞ can be expected. Note that we always find an invertible bi-measurable map T : , however, it is highly nonlinear and therefore the proof of Corollary 1.13 breaks down.

Corollary 1.15 (Geometric profile characterization under rotational invariant Z ∞ ).
Assume the hypotheses of Theorem 2 to be satisfied for some Then the image set satisfies ω(x) ⊂ {|z| = r} for some r = r x > 0 if and only if the family (X ε,x ) ε∈(0,1] exhibits a profile cutoff as ε → 0 at the enhanced time scale (t x ε , w x ε ). Remark 1.16 (Generic normal growth profile characterization). In the sequel, we characterize when the function ω(x) ∋ u → |u| is constant for the generic case of the setting in Corollary 1.15. We enumerate v 1 , . . . , v m given in Lemma 1.1 as follows. Without loss of generality we assume that θ 1 = 0. Otherwise we take v 1 = 0 and eliminate it from the sum m k=1 e iθ k t v k . Without loss of generality let m = 2n + 1 for some n ∈ N. We assume that v k and v k+1 =v k are complex conjugate for all even number (1) If the real parts and the imaginary parts of the (complex) vectors v 2 , . . . , v 2n in the Hartman-Grobman Lemma 1.1 form an orthogonal family and |ℜ(v 2k )| = |ℑ(v 2k )| for all k. Then Lemma E.1 in [6] implies that ω(x) ⊂ {|z| = r} for some r = r x > 0 and hence Corollary 1.13 yields a profile cutoff. Proof of Corollary 1.13. We apply the characterization given in Theorem 3. Let v 1 , v 2 ∈ ω(x) and a > 0. For B given in the statement, we have |Bv 1 | = |Bv 2 | = r. Then there exists an orthogonal matrix O such that O(Bv 1 ) = Bv 2 . Theorem 5.2 of [21] and O,B being invertible implies Since O(Bv 1 ) = Bv 2 and O is orthogonal, the rotational invariance of BZ ∞ implies Again, Theorem 5.2 of [21] yields Combining the preceding equalities we obtain for any v 1 , v 2 ∈ ω(x) and a > 0 which yields (1.14) and hence the desired profile cutoff.
Proof of Corollary 1.15. By Corollary 1.13 (A = I d ) it is enough to prove the converse implication. Since the family (X ε,x ) ε∈(0,1] exhibits a profile cutoff as ε → 0 at the enhanced time scale When the vector field is given by b(x) = Ax, x ∈ R d for a general deterministic d × d matrix A whose eigenvalues have positive real parts, the cutoff phenomenon is completely discussed in [10], Theorem 2.3 under Hypothesis (H), which is covered by Hypothesis 4. It is well-known that such linear systems are more general than linear systems satisfying Hypothesis 1. For instance, the classical linear oscillator with friction γ > 0 has negative real parts in (−∞, −γ/2] but fails to be coercive, [6].
The case of pure Brownian motion is covered in detail in Section 3.3 in [9]. For degenerate driving noise processes L and general cutoff results in the Wasserstein distance, we refer to [6]. There, complex systems of linear oscillators in a thermal bath are covered.
and satisfies Hypothesis 1. Indeed, the Jacobian of b at x is given by where H η (x) denote the Hessian matrix at x. By (1.17) we obtain for any x, y ∈ R d y, Db(x)y = Ay, Ay + 3 Bx, Bx By, By + H η (x)y, y δ|y| 2 , where δ = δ 1 − δ 2 > 0. Hence the vector field b = ∇V satisfies Hypothesis 1. We consider the solution of (1.3) with vector field b = ∇V. Note that in this case |ω(x)| = 1, where ω(x) is given in (1.13). Note that equation (1.6) -not even in simplest case of d = 1, L = W , a standard Wiener process, A = B = 1 and η ≡ 0 -does not have an explicitly known solution for the Kolmogorov forward equation of the densities. While for any dimension d, L = W a standard Brownian motion the invariant density is well-known to be proportional to exp (− 2V(x) /ε 2 ), for a complete discussion, see for instance Section 2.2 in [59]. For the case of dimension d = 1, nonlinear b satisfying Hypothesis 1, L = W a standard Brownian motion the authors prove profile cutoff for (1.6) in [8]. For higher dimensions, window cutoff is established in this case and the existence of profile cutoff is characterized, we refer to [9]. We remark that the authors strongly use the hypo-ellipticity property and the resulting regularization by the generator of the Brownian diffusion.
For a strongly locally layered stable noise L satisfying Hypotheses 2, 3 and 4, Corollary 1.11 implies for α > 3/2 the presence of a cutoff profile. In particular, the system exhibits cutoff in the sense of equation (18.3) in Chapter 18 of the monograph [44] as follows for any η ∈ (0, 1), where the mixing time is given by

Profile vs Window cutoff for nonlinear oscillations.
In the sequel we analyze a class of nonlinear oscillators for which the existence of a cutoff profile is studied in detail. We consider the nonlinear system (1 for some η ∈ R, for any x 1 , x 2 ∈ R and some positive constant δ 1 , δ 2 , and G ∈ C 2 (R 2 , R). Assume that b(0, 0) = (0, 0) * . We verify that b is a non-gradient vector field. The Jacobian matrix of b is given by Since for any η = 0 the Jacobian Db matrix is asymmetric, there is no and consequently b is non-gradient. Under the assumption that . For a rotationally invariant α-stable noise L in R 2 with α > 3 /2, Theorem 2 yields window cutoff. In the sequel, we study the presence of a cutoff profile. Hence Z ∞ is rotationally invariant. Indeed, there is K α > 0 such that the characteristic function of Z ∞ reads For a := ∂ 11 G(0, 0) = ∂ 22 G(0, 0) and ∂ 12 G(0, 0) = 0 we have Assume a negative discriminant ∆ := (2δ 2 − 2δ 1 ) 2 − 4η 2 < 0 and δ 1 + δ 2 + a > 0. Then the complex eigenvectors associated to the eigenvalues and The respective family real and imaginary part vectors are given (1) Nonlinear nongradient system with a cutoff profile: For δ 1 = δ 2 we obtain where O is an orthogonal matrix. Therefore, whenever 2δ 1 + a > 0, Corollary 1.13 for A = I 2 yields profile cutoff.
Since Z ∞ is the limiting distribution as t → ∞ of the Ornstein-Uhlenbeck process (Z t ) t 0 . Lemma 1.7 implies that Z ∞ has a C ∞ density f ∞ . In the sequel we study the unidimensional case. Theorem 53.1 in [55] yields that f ∞ is unimodal with mode m, that is to say, it is increasing on (−∞, m) and decreasing on (m, +∞) and hence f ∞ (m) > 0. Without loss of generality (f is smooth), we assume that z > 0. In the sequel, we determine the asymptotics of the profile function ρ → (e −ρ e −λxτx v x + Z ∞ ) − Z ∞ TV in zero and at infinity for some special cases. The density f ∞ of Z ∞ is explicitly accessible only in a limited number of cases. We start with the asymptotics for ρ ≪ −1. Without loss of generality (f ∞ is smooth), we assume that z > 0. Then By Scheffé's lemma the lefthand side of the preceding equality tends to zero as z → ∞. Hence the right-hand side implies m z → −∞ and m z + z → ∞, as z → ∞. We have which reduces in the symmetric case to where F ∞ is the cumulative function of Z ∞ .
We compare the prototypical shapes of the tails of the profile functions.
I) For the symmetric α-stable process L we obtain the profile function where C α is an explicit constant. II) The asymptotically doubly exponential shape of the profile for the case of Gaussian tails F ∞ is discussed in Remark 2.3 in [5] which reads in our setting as follows We continue with the asymptotics at zero of the profile function and show As a consequence of the preceding limit we obtain In particular, as ρ → ∞, the profile is asymptotically proportional to the respective Wasserstein profile [6].

1.3.5.
Counterexample to Slutsky's lemma in total variation distance.
The following example is given for completeness since we are not aware of counterexample in the literature. It is based on private communication with professors M. Jara and R. Imbuzeiro Oliveira. Let (U n ) n∈N be a sequence of random variable with the uniform distribution supported on the set { j /n : j = 1, . . . , n} and -independently-a sequence (R n ) n∈N of random variables with continuous uniform distribution supported on the set [0, a n ], where (a n ) n∈N is any sequence of positive numbers such that na n → 0 and a n → 0, as n → ∞. For each n ∈ N, we define X n = U n + R n and Y n = −R n . Then the following holds.
(1) X n and Y n are absolutely continuous with respect to the Lebesgue measure on R.
Items (1), (2), (3) and (5) are straightforward. In the sequel we verify (4). Since U n and R n are independent, the convolution formula yields that the density of X n , f n , is given by First, for z 0, it follows that f n (z) = 0 for all n ∈ N. Next, for z > 1 there exists n 0 = n 0 (z) ∈ N such that 0 < a n < z − 1 for all n n 0 . Hence, for all j = 1, . . . , n we have 0 < a n < z − j /n for all n n 0 . Consequently, f n (z) = 0 for all n n 0 . We continue with the case z ∈ (0, 1]. Then there exists n 1 := n 1 (z) ∈ N such that z /2 < z − a n < z for all n n 1 . Then we have for all n n 1 as n → ∞.

Global steps of the proofs of Theorem 2 and Theorem 3.
Our idea is to carry out an asymptotic expansion in ε by probabilistic methods. It turns out that the hyperbolic contracting nature of the underlying deterministic dynamics ϕ x can be used to show that the correct first order expansion of X ε,x of the sense of Freidlin-Wentzell [30] Chapter 2.2 given by the inhomogeneous Ornstein-Uhlenbeck defined in (1.26) provides an asymptotic description of X ε,x which is effective for time scales beyond the cutoff time scale.

Freidlin-Wentzell first order expansion.
It is not hard to see that for any η > 0 and t 0 the law of large numbers implies In the sequel, we analyze the asymptotic fluctuations of Then the process (Z ε,x t ) t 0 is the unique strong solution of the stochastic differential equation for any t 0. However, (1.24) has the same level of complexity as (1.6). Using (1.23) in (1.24) we derive the linear inhomogeneous approximation of (1.24) as follows. Let (Y x t ) t 0 be the unique strong solution of the linear inhomomogeneous stochastic differential equation (1.25) Instead of (1.23) we claim the stronger (first order approximation in the sense of Section 2, Chapter 2 in [30]) result where t x ε is given in (1.12). For a concise quantification of the approximation, see Lemma B.1 in Appendix B. Next, we define the first order approximation It is not hard to see that for any ε there exists a limiting distribution µ ε * such that for any x ∈ R d , the process (Y ε t (x)) t 0 converges to µ ε * in the total variation distance as t tends to infinity. For further details see Lemma C.4 in Appendix C. Moreover, it is shown there that µ ε * is the unique invariant distribution of the homogeneous Ornstein-Uhlenbeck process and has a C ∞ -density with respect to the Lebesgue measure on R d . Note that (Y ε t (x)) t 0 satisfies the inhomogeneous equation Since we need to compare solutions of stochastic differential equations with different initial conditions, we introduce the following notation. Let T be a positive number and ξ be a given random vector on R d . We assume that ξ is F T -measurable for (F t ) t 0 defined in Subsection 1.2.2. Let (Y ε,x (t; T, ξ)) t 0 be the unique strong solution of the stochastic differential equation In what follows, we always take T = T x ε . Then T x ε > 0 for 0 < ε ≪ 1.

Key cutoff estimate.
The proofs of the main results of Theorem 2 and Theorem 3 are based on the following fundamental inequality. On the one hand, note that for any ρ ∈ R and ε small enough we have Roughly speaking, it turns out that the processes (X ε t (x)) t 0 and (Y ε t (x)) t 0 are close enough for time scales of order O(ln( 1 /ε)) in order to carry out the following quantitative coupling procedure. Since (X ε t (x)) t 0 and (Y ε t (x)) t 0 have different (inhomogeneous) drifts, couplings which dominate the total variation distance typically only hold for short-time horizons. For an excellent introduction on the subject in the diffusive case we refer to [27]. Since the process (Y ε t (x)) t 0 is linear, the precise cutoff behavior (cutoff, window cutoff and profile cutoff) is derived from it in the spirit of [10]. However, it is inhomogeneous such that the results of [10] cannot be applied directly. They are adapted in Subsection 2.1. Recall that µ ε * is the limiting distribution of the process (Y ε (t; 0, x)) t 0 .
where τ x is given in Lemma 1.1 if and only if for any a > 0 the map The proof is given in Subsection 2.1 and relies on the Hartman-Grobman decomposition of Lemma 1.1. In what follows, we argue that the upper bound of inequality (1.29) tends to zero as ε → 0. To be precise, we show the following.
Proposition 2 (Error term E 1 : nonlinear short time coupling). Assume Hypotheses 1, 2, 3 and 4 to be satisfied for α ∈ (3/2, 2) and β > 0. Let ∆ ε = ε α /2 . For any x ∈ R d it follows The complete proof can be found in Subsection 2.3 and it is based on the local limit theorem for strongly locally layered stable Lévy measures on the short-time scale ∆ ε → 0. The limitation of α ∈ (3/2, 2) is due to the tail integrability of the characteristic function of X ε ∆ε (x). It is of technical nature, but it seems difficult to remove.
Proposition 3 (Error term E 2 : linear inhomogeneous coupling). Assume Hypotheses 1, 2, 3 and 4 to be satisfied for α ∈ (0, 2) and β > 0. Let The formal proof is given Subsection 2.2 and relies on a version of the local limit theorem by [33] for strongly locally layered stable distributions and small times ∆ ε . We approximate the invariant distribution µ ε of X ε · (x) by the limiting distribution µ ε * of the inhomogeneous Ornstein-Uhlenbeck Y ε · (x) in the total variation distance. Proposition 4 (Error term E 3 : equilibrium asymptotics). Assume Hypotheses 1, 2, 3 and 4 to be satisfied for α ∈ (3/2, 2) and β > 0. It follows The proof is given in Subsection 2.4.
Proof of Theorem 2 and Theorem 3. We apply Propositions 2, 3 and 4 to the key estimate (1.29) and obtain lim ε→0 Finally, Proposition 1 implies the main result in Theorem 2 and Theorem 3. Let ε ∈ (0, 1). By Lemma C.3 we see that µ ε * is the distribution of εZ ∞ , where Z ∞ is the unique invariant distribution of the homogeneous Ornstein-Uhlenbeck process (Z t ) t 0 given by We start with the observation that Z ∞ is absolutely continuous. Indeed, let ζ t be the characteristic function of Z t and ζ ∞ be the characteristic function of Z ∞ . By Theorem 3.1 in Sato and Yamazato [56] we have for any t 0 Hence, |ζ ∞ (θ)| |ζ t (θ)| for all θ ∈ R d . Then Item 3. in Section 4 of [10] implies that Z ∞ has a bounded C ∞ -density with respect to the Lebesgue measure on R d , where we take κ(v) = c ∢ |v| α in their notation, and c ∢ , α being given in Lemma 1.7. In particular, Z ∞ is absolutely continuous Remark 2.1. Note that for x = 0, ϕ x t = 0 for any t 0 and consequently, D ε,x (t) = 0. Since the right-hand side of inequality (2.2) does not depend on ε and tends to zero for t → ∞, we have for any time scale (s ε ) ε∈(0,1) such that s ε → ∞, as ε → 0, the limit lim Hence with continue with x = 0 and recall that D ε, . . , θ m x and v 1 x , . . . , v m x are the quantities given by the Hartman-Grobman decomposition in Lemma 1.1. In addition, let By the triangle inequality we have and analogouslyD ε, Combining (2.2) and (2.3) we obtain The limit (C. 8) in Lemma C.4 shows that in particular, for any ρ ∈ R Claim: For any ρ ∈ R we have First, we see that scale and shift invariance imply for all t 0 Secondly, the Hartman-Grobman decomposition in Lemma 1.1 states and the very definition of t x ε and w x ε yields Combining (2.7), (2.8), the absolute continuity of Z ∞ with the Scheffé lemma implies that R ε,x (t x ε + ρ · w x ε ) tends to zero as ε → 0. Joining (2.4), (2.5) and (2.6) yields that any cutoff phenomenon in the sense of Definition 1.8 can be read off from the simpler termD ε,x .

Window cutoff for the inhomogeneous O-U process.
We observe that lim t→∞ v(t, x) may not exist in general. Set Then there exists a subsequence (t x ε j + ρ · w x ε j ) j∈N of (t x ε + ρ · w x ε ) ε∈(0,1] such that ε j → 0 as j → ∞ and for whichD Notice that the sequence (v(t x ε j + ρ · w x ε j − τ x )) j∈N is bounded by m k=1 |v k |. Then the Bolzano-Weierstrass theorem yields the existence of a subsequence (ε jn ) n∈N of (ε j ) j∈N such that By constructionv ρ (x) ∈ ω(x). Combining (2.8) and (2.9) and using that the law of Z ∞ is absolutely continuous with respect to the Lebesgue measure on R d , Scheffé's lemma implies no := herē (2.10) Analogously, we deduce wherev ρ (x) ∈ ω(x). Let ρ > 0. In the sequel, we send ρ → ∞. We observe that the upper limiting vectorv ρ (x) depends on ρ, however, it is uniformly bounded by Combining (2.12) and (2.13) shows the window cutoff limits forD ε,x and hence the window cutoff for the family (Y ε (x)) ε∈(0,1) .

Profile cutoff for the inhomogeneous O-U process.
RecallD where v(t, x) = m k=1 e iθ k x t v k x and λ x , ℓ x , τ x , θ 1 x , . . . , θ m x and v 1 x , . . . , v m x are the quantities given in Hartman-Grobman decomposition in Lemma 1.1. By (2.10) and (2.11) we have for any ρ ∈ R lim sup ε→0D ε, We start with the necessary condition for profile cutoff in Theorem 3. If for any a > 0 the map where v is any representative of ω(x). This yields the desired profile cutoff for the family (Y ε,x ) ε∈(0,1) .
We continue with the sufficient condition for profile cutoff in Theorem 3. Let v ∈ ω(x), i.e. there exists a subsequence (t j ) j∈N such that For any x ∈ R d and ρ ∈ R consider the parametrization ε → t x ε + ρ · w x ε − τ x and set t j := t x ε j + ρ · w x ε j − τ x for all j ∈ N. Limit (2.8) and Scheffé's lemma imply Since we are assuming profile cutoff, it follows

Coupling for the inhomogeneous O-U processes (Proposition 3).
2.2.1. Coupling by the local limit theorem for locally layered stable drivers.
We keep the notation introduced in Subsection 1.4. Let ∆ ε = ε α /2 . For any ρ ∈ R and x = 0, recall that T x ε = t x ε − ∆ ε + ρ · w x ε , where t x ε and w x ε are given in Theorem 2. For x = 0 any time T x ε = O(| ln(ε)| 2 ) can be taken (see Lemma 2.3). We show the following limit We recall that (ϕ x t ) t 0 is the solution of (1.3). By (1.28) and the variation of constants formula yields the explicit representation where (Φ ε t (x)) t 0 is the solution of the matrix valued inhomogeneous differential equation Since ϕ x T x ε +t → 0, as ε → 0, U x ε resembles the respective homogeneous Ornstein-Uhlenbeck process. We claim that there exists a scale γ ε (independent of x) and a deterministic vector a x ε such that γ ε U x ε + a x ε converges in total variation distance to an absolutely continuous random vector as ε → 0. To be precise, we state it as a proposition below.
Remark 2.2. Assume that the Lévy measure ν is locally layered stable in the sense of Definition 1.4 with parameters (ν 0 , ν ∞ , Λ, q, c 0 , α). Let α ∈ (0, 2) and β > 0, where α is given in Definition 1.4 and β is given in Hypothesis 2. It is not hard to adapt the proof of Theorem 3.1 in [33] to deduce that where S α (Λ 1 ) is a strictly α-stable process with spectral density Λ 1 (dθ) = c 0 (θ)Λ(dθ). If in addition, we assume (1.7) and (1.8), then c 0 is a symmetric function and therefore Λ 1 , too. The vectors η α,β and b α,β are explicit and their formulas are given in the statement of Theorem 3.1 in [33]. To be precise, the authors in [33] state the stronger tail condition (3.3) on the Lévy measure ν. However, in their proof of Theorem 3.1 in [33] which treats short-range behavior, it is only used to guarantee the following (according to their notation): for f being a bounded continuous function vanishing in a neighbourhood of the origin, and h > 0, ε > 0, that the iterated integral below is bounded independently of h

In our setting of Definition 1.4, it is bounded by
which is finite for any ε ∈ (0, 1).

Proposition 5 (Local limit theorem for the inhomogeneous O-U approximation).
Assume that ν is a strongly locally layered stable Lévy measure in the sense of Definition 1.4 with parameters (ν 0 , ν ∞ , Λ, q, c 0 , α). Let α ∈ (0, 2) and β > 0, where α and β are given in Definition 1.4. Then for any K > 0 we have where γ ε := ∆ − 1 /α ε , the random vector U has a symmetric α-stable distribution with spectral density Λ 1 (dθ) = c 0 (θ)Λ(dθ), and the deterministic vector a x ε is given by In particular, for x = 0 we have lim Proof of Proposition 5. By the continuity shown in Lemma C.2 we have for any ε > 0 a point In the sequel, we show that the right-hand side tends to 0 as ε → 0. For simplicity, we drop the ε-dependence of x ε which is denoted by x. We stress that in the proof below the dependence of x only enters in terms of |x|, which is uniformly bounded by K.
We show the existence of the distributional lim ε→0 (γ ε U x ε + a x ε ) for a suitable deterministic scale γ ε such that lim ε→0 γ ε = ∞ and a deterministic vector a x ε . By (2.18) and since the process L is additive, it is not hard to deduce that its characteristic function has the following shape The translation invariance of the Lebesgue integral in the preceding exponent implies that in We start with the second term. Since |ϕ x t | |x| for any t 0, we have where the last inequality follows from inequality (A.3) in Lemma A.3. Since γ ε = ∆ −1/α ε , we obtain where S α (Λ 1 ) is a symmetric α-stable process with spectral measure Λ Since ∆ ε → 0, Slutsky's lemma yields As a consequence we have that the right-hand side of (2.22) tends to zero, as ε → 0. We continue with the first term J 1 . Since J 1 = L ∆ε , limit (2.23) implies where η α,β and b α,β are deterministic vectors on R d , and U has a symmetric α-stable distribution with spectral measure Λ 1 . By (2.19) and (2.20) we obtain where a x ε is given in (2.19). We stress that the dependence of x in the preceding limit only enters via C(|x|) in (2.22) and holds uniformly for |x| K.
We strengthen the convergence in distribution in (2.26) to the convergence in total variation distance, using the regularity of the densities and showing their convergence in L 1 (R d ). This can be carried out using the Fourier inversion formula of the explicit characteristic function of the linear process γ ε U x ε + a x ε and the Orey-Masuda condition in Lemma 1.7, analogously as in the proof of Lemma C.4 in Appendix C.3.2. Since this procedure is spelt out in full detail in Lemma C.4 for the limit lim In the sequel, we derive an upper bound of TV with the help of Proposition 5, which tends to zero as ε → 0.

For short, let
. The shift and scale invariance of the total variation distance and representation (2.16) yield where a x ε is given in (2.19) and γ ε being given in Proposition 5. Triangle inequality yields where U has a S α (Λ 1 ) distribution given in Proposition 5. The independence of the increments of L yields that (Φ ε t (x)) −1 z and (Φ ε t (x)) −1z are independent of U x ε and U , respectively. Then we apply the cancellation property of independent shifts of the total variation given in part ii) of Lemma A.2 of [10] and obtain

(2.28)
We prove that the right-hand side of the preceding inequality tends to zero as ε → 0. By Proposition 5 it remains to prove that Let P x ε (du, dũ) denote the joint probability measure P X ε T x ε (x) ∈ du, Y ε,x (T x ε ; 0, x) ∈ dũ and keep the notation z = X ε Since z andz are nondegenerate and mutually dependent random variables the shift property for the total variation distance cannot be applied directly. Nevertheless, the Markov property and the shift invariance allow to disintegrate P x ε as follows We continue with the following split. For any η > 0 we have We start with the second term on the right-hand side of (2.30). Since the shift operator is continuous at 0 in L 1 (R d ) for any ρ > 0, there exists η = η(ρ) > 0 such that By Lemma A.3 we obtain |(Φ ε ∆ε ) −1 (x)| √ d for any ε ∈ (0, 1] and x ∈ R d , where | · | denote the standard matrix 2-norm which in abuse of notation we also denote by | · |. By Hypothesis 1 the event {γ ε |ζ −ζ| ηε} implies γ ε ε (Φ ε ∆ε (x)) −1 (ζ −ζ) √ dη for any ε.

Nonlinear short-time coupling (Proposition 2).
We keep the notation introduced in Subsection 1.4. Let ∆ ε = ε α /2 . For any ρ ∈ R, recall that where t x ε and w x ε are given in Theorem 2. We show the following.
We recall that (ϕ x t ) t 0 is the solution of (1.3). By (1.28) and the variation of constants formula yields the explicit representation where (Φ ε t (x)) t 0 is the solution of the matrix valued inhomogeneous differential equation given in (2.17) and the random vector U x ε is defined by (2.18). For any z ∈ R d we consider the unique strong solution (Z ε t (z)) t 0 of The variation of constant formula yields the representation and Ψ t = e Db(0)t , t ∈ R. It is easily seen that Ψ −1 t = Ψ −t . We start with the estimate

2.3.1.
Step 1: Domination of the error term G 2 .
We estimate the second term on the right-hand side. Let By disintegration combined with the translation and scale invariance of the total variation distance, we obtain where P x ε (dz) := P(X ε T x ε (x) ∈ dz). By Proposition 5 there exists a random variable U d = S α (Λ 1 ) and the deterministic vector a x ε ∈ R d defined in (2.19) such that Repeating the same argument of Proposition 5, we have that there exists a random variablẽ U d = S α (Λ 1 ) and the deterministic vector a 0 ε ∈ R d given by We define the deterministic function and the pivotal terms Scale and shift invariance of the total variation distance combined with the triangle inequality yield Estimate of B ε 2 (z) in (2.41). By the cancellation property of the total variation distance for independent increments we have as ε → 0, due to (2.39). As a consequence, we have Due to (2.38) we obtain By Proposition 5 we have We estimate (2.42) We estimate the first term on the right-hand side. By Lemma A.1 there exists a positive constant C(|x|) depending continuously on |x| such that (2.43) |ϕ x T x ε | C(|x|)ε for all ε ≪ 1. With the help of inequality (A.3) in Lemma A.3, the mean value theorem and the fact that |ϕ x t | |x|, t 0, we have Since ∆ ε = ε α/2 , both preceding terms on the right-hand side tend to zero as ε → 0. We continue with the second term on the right-hand side of (2.42). By Lemma A. 3.v) we have for ε sufficiently small that where C 1 (|x|) is a constant that depends continuously on |x|. Then for small values of ∆ ε we have Since ∆ ε = ε α/2 , we have |a ε − a 0 ε | → 0, as ε → 0 and consequently by the Scheffé lemma we obtain a ε − a 0 ε + U − U TV → 0, as ε → 0. With the same reasoning we get For ϑ ∈ (0, 1 /4) we define r ε := ε 1−ϑ and estimate By Lemma D.5 we have for the second term We continue with the first-term of the right-hand side of the preceding inequality. Recall that where U andŨ are S α (Λ 1 ) distributed, and where Γ x ε was defined in (2.37). By Lemma A.3.v) there exists a positive constant C(|x|) depending continuously on |x| such that The preceding inequality combined with inequality (2.43) yields for ε sufficiently small It remains to estimate where the last inequality follows from Lemma A.1. As a consequence we have The shift continuity of L 1 and the compactness of the Euclidean closed ball imply Since ∆ ε = ε α/2 , the preceding inequality combined with estimates (2.44) and (2.45) yields This finishes the proof of Step 1.

2.3.2.
Step 2: Domination of the error term G 1 up to a term in distribution.
By Theorem 1 in [61], we have the following almost sure estimate Recall that by the Lévy-Itô decomposition [55], Chapter 4, the driving noise process (L t ) t 0 under Hypotheses 3 has the following representation as Poisson random integrals where N is the Poisson random measure associated to the Lévy measure ν on R d \ {0} andÑ is its compensated counterpart In particular, we have the representation of the quadratic variation of X ε given by H ε s− (z), εu N (dsdu).
Since b(0) = 0, Hypothesis 1 yields We continue term by term. Firstly, we obtain The second term (2.54) Finally for β 1 we have For β ∈ (0, 1) using the subadditivity of the root for sums of nonnegative terms (see [53]), Markov's inequality and Hypothesis 2 we have This finishes the proof of (2.51) and hence implies (2.50).
In this step we prove that G 1 → 0, ε → 0. More precisely, we show for ∆ ε = ε α/2 , α ∈ (3/2, 2), β > 0 the following limit In what follows we strengthen the result of Step 2 to the convergence in the total variation distance via the following localization procedures to bounded jumps and bounded vector field. We stress that the following two lemmas are true in full generality. In particular, for any α ∈ (0, 2) and β > 0.
Since T ε > ∆ ε with high probability, we can assume without loss of generality the presence of only bounded jumps even in the total variation distance.
(1) We stress the following conscious abuse of notation. Lemma 2.4 yields that it is enough to prove (2.66) for X ε being replaced by X ε,Tε . In other words, we may assume that X ε has bounded jumps beforehand and consequently all polynomials moments finite. (2) In addition, Lemma 2.5 allows us to consider bounded vector fields in the spirit of Section 4 in [26]. That is to say, it is enough to prove (2.66) for X ε being replaced by X ε t (z) = X ε,T ε t (z).
Proof. For simplicity we keep the same notation except for the driving noise which we denote byL. We setL Note that since Ψ t = e Db(0)t we have where h is given in (2.63). Note that the limit (2.51) is shown for X ε . It is easily seen -going through the proof line by line -that it remains valid for X ε being replaced by X ε , i.e., In the sequel, we strengthen the convergence of (2.67) to the convergence in the total variation distance. SinceL has absolutely continuous marginals and X ε ∆ε is a continuous push-forward of L, it retains the absolute continuity property. In addition, it is not hard to see that Lemma 1.7 yields a C ∞ -density for S α (Λ 1 ). Hence it is enough to prove where f ε is the density of X ε ∆ε (z) /(ε∆ 1 α ε ) + a ε and f 0 is the density of S α (Λ 1 ). By Scheffé's lemma for densities it is sufficient show that f ε → f 0 , as ε → 0, Lebesgue almost everywhere in R d . For this sake, it is sufficient to prove that as ε → 0.
Since f ε , f 0 ∈ L 2 (R d ), by Plancherel's identity we have a positive constant C π such that Since the weak convergence (2.67) implies thatf ε →f 0 uniformly on compacts, we have for any The exponential decay off 0 yieldsf 0 ∈ L 2 (R d ). Sending K to infinity we deduce that In order to conclude, it remains to show that the right-hand side of the preceding inequality is 0. Recall the differential version of (2.68) with initial datum X ε 0 (z) = 0. In the sequel, we calculate φ t (θ) := E e i θ,X ε t (z) . Itô's formula yields Since the process X ε has finite first moment, taking expectation and using Fubini's theorem we obtain Note that where ψ is the characteristic exponent of the Lévy measure ν. We set θ ε := θ /(ε∆ 1 α ε ) for |θ| K. For the real and the imaginary part of φ t (θ) we have the equalities The chain rule for the respective differential forms reads as follows with E sin θ ε , X ε 0 (z) 2 = 0. We sum up the preceding equations and obtain d dt For the third term we infer analogously Since α ∈ ( 3 /2, 2) we obtain for sufficiently small ε and |θ| K for |z| r ε for sufficiently small ε. Taylor's theorem combined with the jump size and spatial localizations yields for ε sufficiently small where C 2 > 0 only depends max |u| 2 |b(u)|, max |u| 2 |Db(u)| and max |u| 2 |D 2 b(u)|. Hence the variation of constants formula yields Note that the shift a 0 ε does not change the modulus off ε (θ) and hence the integrability in θ. The parameter value α ∈ ( 3 /2, 2) implies that Since ∆ ε = ε α 2 we have the desired limit (2.69) for any |z| r ε In order to see the uniformity we refer to the continuity of the map That is, the supremum is taken at some value z ε such that In the previous calculation the only property of z we use is that |z| r ε . Hence all previous results remain valid for z being replaced by z ε .
By (2.46) this finishes the proof of Proposition 2.

Inhomogeneous O-U approximation of the limiting distribution (Prop. 4).
Let x 0 ∈ R d and t > 0. The triangle inequality yields (2.71) Here, we estimate the first-term of the right-hand side of inequality (2.71). By disintegration and the invariance property of µ ε it follows Let s ε ≫ t x 0 ε for sufficiently small ε. The triangle inequality for the total variation distance implies Since the total variation distance is bounded by one, we have for any Combining the preceding inequalities with inequality (2.71) we obtain for any x 0 ∈ R d and s ε ≫ t x 0 ε for sufficiently small ε > 0.
Estimates for I 1 in (2.72). Let x 0 = 0. By Proposition 1 we have By (2.1) and Lemma C.2 we have that for any K > 0 Estimates for I 2 in (2.72). We estimate the second term as follows By Proposition B.1 our estimates in the previous sections remain valid up to times of order ε −ϑ for some ϑ > 0. In the sequel, we set s ε := ln 2 (ε).
We start with the second term on the right-hand side of (2.74) and lighten the notation. By Proposition B.1 it is not hard to see that Lemma 2.3 remains valid for For the convenience of the reader, we restate it here.
In the sequel, we continue with the first term on the right-hand side of (2.74). By Corollary D.5 we have for any η > 0, ϑ ∈ (0, 1) and K > 0 Since s ε − ∆ ε ≫ t x 0 ε , it is straightforward to see that the limit (2.57) remains valid for T x ε being replaced by s ε − ∆ ε . Consequently, we have Estimates for I 3 in (2.72). By Corollary D. 4 we have for all β ′ β ∧ 1 a positive constant C such that Therefore the dominated convergence theorem we infer lim t→∞ R d (e −δβ ′ t |u| β ′ ∧ n)µ ε (du) = 0 for all n ∈ N, ε ∈ (0, 1].
Therefore, we have for all n ∈ N such n > Cε By the monotone convergence theorem we obtain By (2.76) and the Markov inequality it follows Estimates for I 4 in (2.72). Note that We start with the first term. Recall that r ε = ε 1−ϑ for ϑ ∈ (0, 1 /4). For P x ε (dz) = P(X ε sε−∆ε (x) ∈ dz) disintegration yields for some |u ε | K and |z ε | 2r ε . The right-hand side of the preceding inequality tends to zero, as ε → 0. This is due to Proposition 2 and Corollary D.5. We continue with the second term on the right-hand side of (2.77). Let . Using the shift continuity (2.31) we fix ρ and choose η > 0 accordingly. Again, by disintegration we have We prove that the right-hand side of the preceding inequality tends to zero, as ε → 0. Due to limit (2.75) it follows By Corollary D.5 and a straightforward adaptation for the linearization Y ε,u , we have We continue with the term

By (2.36) we have
By the shift and scale invariance of the total variation distance we obtain In the sequel we estimate for η > 0 Recall that (2.31) implies that yields that (2.78) is bounded from above by ρ. Sending first ε → 0 and then ρ → 0 yields the limit of (2.78) equals 0.
It remains to show This, however, is the result of Proposition 5.
Estimates for I 5 in (2.72). We start with the triangle inequality The second term of the preceding inequality is equal to I 1 and tends to 0 as ε → 0. By (2.1) and since in Lemma C.3, item (1), it is shown that µ * ε is the law of εZ ∞ we have Y ε,u (s ε ; 0, u) − µ ε * TV Y ε,u (s ε ; 0, u) − Z ∞ TV + ϕ u sε/ε + Z ∞ − Z ∞ TV for any u ∈ R d .
We start with the first term. By Lemma C.4 we have We treat the second term. Let η > 0. By the shift-continuity of L 1 distance yields that there exists ρ := ρ(η) > 0 such that Note that for |u| K we have ϕ u sε ε e −δsε |u| ε e −δsε K ε < ρ for sufficiently small ε.
Therefore lim sup and consequently where t x ε and w x ε are given in Theorem 2. Then there exists a positive constant C(|x|, ρ) that depends continuously on |x| such that |ϕ x T x ε | C(|x|, ρ)ε. Proof. By Lemma 1.1 we have where v(t, x) = m k=1 e iθ k x t v k x . A straightforward calculation shows that Then the triangle inequality yields where the last inequality follows from limit (A.1) and limit (A.2).

Lemma A.2 (Gronwall-Bellman inequality).
Let T > 0 be fixed. Let g : [0, T ] → R be a C 1 -function and h : [0, T ] → R be continous. If where a ∈ R, and the derivative at 0 and T are understanding as the right and left derivatives, respectively. Then For the proof, see for instance Theorem 1.3.3 page 15 of [49].
Lemma A.3. Let (ϕ x t ) t 0 be the solution of (1.6). We consider for any fixed T 0 the solution Φ = (Φ t (x)) t 0 of the matrix differential equation the solution Ψ = (Ψ t ) t 0 of the matrix differential equation and the standard matrix 2-norm | · |. Then the following statements are valid for any 0 s t.
i) It follows ii) For C(|x|) = max |u| |x| |Db(u)| we have In particular, The chain rule and Hypothesis 1 yield ds.
We apply Hypothesis 1 and obtain a.s. We continue term by term. First we obtain where we have used the subadditivity of the power β ∧ 1 in the sense of Subsection 1.1.2, see formula (1.6) in [53]. Optimizing over θ we obtain θ = 1 1+2(β∧1) .
Appendix C. The linear inhomogeneous dynamics C.1. β-Hölder continuity of the characteristic exponent of a Lévy process.
It is classical that β 1 in Hypothesis 2 implies that the characteristic function is continuously differentiable, and hence locally Lipschitz continuous. This remains valid for the characteristic exponent ψ. In the sequel, we provide an elementary proof for the respective fractional case β ∈ (0, 1).
Then we have the following.
(1) Note that the above calculations for f 1 give an elementary proof of the fact that any pure jump Lévy process with uniformly bounded jumps-sizes has a globally Lipschitz continuous characteristic exponent ψ. (2) The calculations for f 2 yield that any compound Poisson process with β-integrability |z|>1 |z| β ν(dz) < ∞ for some β > 0 has a locally Hölder continuous characteristic exponent ψ with Hölder index β. This extends the well-known result that the existence of integer moments translates to the respective order of differentiability of the characteristic function to the case of fractional moments.
C.2. Continuous dependence of the total variation in the nonlinearity.
In the sequel we extend Theorem 4.1 in [50] to L β for arbitrary β > 0.

Theorem 4 (Exponential ergodicity).
Under the standing assumptions and the Hörmander condition (D.2) there exists a unique invariant distribution µ for (D.1) satisfying exponential ergodicity in the total variation distance.
In addition, we have With the help of the preceding calculations Itô's formula yields |X t | γ c = |x| γ c − γ Hypothesis 1 together with the analogous computations to the proof of Condition LC in Theorem 4 yields that for A = εI d , c = ε and 0 < γ 1 ∧ β there is a constant C > 0 such that ε ∈ (0, 1], x ∈ R d and t 0 imply E |X ε,x t | γ Cε γ . Using the subadditivity of the γ-power we obtain (D.11).
Corollary D.5. For any x ∈ R d and 0 < γ β ∧1 there exists a positive constant C = C(|x|, γ) such that for all ϑ ∈ (0, 1) and ε ∈ (0, 1) we have Proof. By Corollary D.4 and Lemma A.1 we have for some positive constants C 1 and C 2 (|x|). The preceding inequality with the help of Markov's inequality yields which concludes the statement.