A penalized bandit algorithm

We study a two armed-bandit algorithm with penalty. We show the convergence of the algorithm and establish the rate of convergence. For some choices of the parameters, we obtain a central limit theorem in which the limit distribution is characterized as the unique stationary distribution of a discontinuous Markov process.


Introduction
In a recent joint work with P. Tarrès (see [12]), we studied the convergence of the so-called two armed bandit algorithm.The purpose of the present paper is to investigate a modified version of this algorithm, in which a penalization is introduced.In the terminology of learning theory (see [14,15]), the algorithm studied in [12] was a Linear Reward-Inaction (LRI) scheme, whereas the one we want to introduce is a Linear Reward-Penalty (LRP ) procedure.
In our previous paper, the algorithm was introduced in a financial context as a procedure for the optimal allocation of a fund between two traders who manage it.Imagine that the owner of a fund can share his wealth between two traders, say A and B, and that, every day, he can evaluate the results of one of the traders and, subsequently, modify the percentage of the fund managed by both traders.Denote by X n the percentage managed by trader A at time n (X n ∈ [0, 1]).We assume that the owner selects the trader to be evaluated at random, in such a way that the probability that A is evaluated at time n is X n , in order to select preferably the trader in charge of the greater part of the fund.In the LRI scheme, if the evaluated trader performs well, its share is increased by a fraction γ n ∈ (0, 1) of the share of the other trader, and nothing happens if the evaluated trader performs badly.Therefore, the dynamics of the sequence (X n ) n≥0 can be modelled as follows: where (U n ) n≥1 is an iid sequence of uniform random variables on the interval [0, 1], A n (resp.B n ) is the event "trader A (resp.trader B) performs well at time n".We assume P(A n ) = p A , P(B n ) = p B , for n ≥ 1, with p A , p B ∈ (0, 1), and independence between these events and the sequence (U n ) n≥1 .The point is that the owner of the fund does not know the parameters p A , p B .This recursive learning procedure has been designed in order to assign asymptotically the whole fund to the best trader.This means that, if say p A > p B , X n converges to 1 with probability 1 provided X 0 ∈ (0, 1) (if p A < p B , the limit is 0 with symmetric results).However this "infallibility" property needs some very stringent assumptions on the reward parameter γ n (see [12]).Furthermore, the rate of convergence of the procedure either toward its "target" 1 or its "trap" 0 is not ruled by a CLT with rate √ γ n like standard stochastic approximation algorithms (see [10]).It is shown in [11] that this rate is quite non-standard, strongly depends on the (unknown) values p A and p B and becomes very poor as these probabilities get close to each other.
In order to improve the efficiency of the algorithm, one may imagine to introduce a penalty when an evaluated trader has unsatisfactory performances.More precisely, if the evaluated trader at time n performs badly, its share is decreased by a penalty factor ρ n γ n .This leads to the following LRP -or "penalized two-armed bandit -procedure where the notation A c is used for the complement of an event A. The precise assumptions on the reward rate γ n and the penalty rate γ n ρ n will be given in the following sections.The paper is organized as follows.In Section 1, we discuss the convergence of the sequence (X n ) n≥0 .First we show that, if ρ n is a positive constant ρ, the sequence converges with probability one to a limit x * ρ ∈ (0, 1) satisfying x * ρ > 1 2 if and only if p A > p B , so that, although the algorithm manages to distinguish which trader is better, it does not assign the whole fund to the best trader.To get rid of this limitation, we consider a sequence (ρ n ) n≥1 which goes to zero so that the penalty rate becomes negligible with respect to the reward rate (γ n ρ n = o(γ n )).This framework seems new in the learning theory literature.Then, we are able to show that the algorithm is infallible i.e., if p A > p B , then lim n→∞ X n = 1 almost surely, under very light conditions on the reward rate γ n (and ρ n ).From a stochastic approximation viewpoint, this modification of the original procedure has the same mean function and time scale (hence the same target and trap, see (5)) but it always keeps the algorithm away from the trap without adding noise at these equilibria.In fact, it was necessary not to add noise at these points in order to remain inside the domain [0, 1].

Convergence of the LRP algorithm 1.Some classical background on stochastic approximation
We will rely on the ODE lemma recalled below for a stochastic procedure (Z n ) taking its values in a given compact interval I.
Theorem 1 (a) Kushner & Clark's ODE Lemma (see [9]): Let g : I → R such that Id + g leaves I stable ( 1 ).Then, consider the recursively defined stochastic approximation procedure defined on I by Let z * be an attracting zero of g in I and G(z * ) its attracting interval.Then, on the event The Hoeffding condition (see [1]): ) is nonincreasing and n≥1 e − ϑ γn < +∞ for every ϑ > 0, then Assumption (1) is satisfied.
Remark.The monotonous assumption on the sequence γ can be relaxed into γ n → 0 and sup n,k≥1

Basic properties of the LRP algorithm
We first recall the definition of the algorithm.We are interested in the asymptotic behavior of the sequence (X n ) n∈N , where X 0 = x, with x ∈ (0, 1), and Throughout the paper, we assume that (γ n ) n≥1 is a non-increasing sequence of positive and that (ρ n ) n≥1 is a sequence of positive numbers satisfying γ n ρ n < 1; (U n ) n≥1 is a sequence of independent random variables which are uniformly distributed on the interval [0, 1], the events A n , B n satisfy where 0 < p B ≤ p A < 1, and the sequences (U n ) n≥1 and (1 An , 1 Bn ) n≥1 are independent.The natural filtration of the sequence (U n , 1 An , 1 Bn ) n≥1 is denoted by (F n ) n≥0 and we set With this notation, we have, for n ≥ 0, where the functions h and κ are defined by , and the sequence (M n ) n≥0 is the martingale defined by M 0 = 0 and (3) Observe that the increments ∆M n+1 are bounded.

Constant penalty rate
In this subsection, we assume ∀n ≥ 1, ρ n = ρ, with 0 < ρ ≤ 1.We then have and that there exists a unique x * ρ ∈ (0, 1) such that h ρ (x * ρ ) = 0.By a straightforward computation, we have In particular, x * ρ = 1/2 if π = 0 regardless of the value of ρ.We also have Now, let x be a solution of the ODE dx/dt = h ρ (x).If x is non-decreasing and lim x is non-increasing and lim t→∞ x(t) = x * ρ .It follows that the interval [0, 1] is a domain of attraction for x * ρ .Consequently, using Kushner and Clark's ODE Lemma (see Theorem 1), one reaches the following conclusion.
The natural interpretation, given the above inequalities on x * ρ , is that this algorithm never fails in pointing the best trader thanks to Inequality (4), but it never assigns the whole fund to this trader as the original LRI procedure did.

Convergence when the penalty rate goes to zero
Proposition 2 Assume lim n→∞ ρ n = 0.The sequence (X n ) n∈N is almost surely convergent and its limit X ∞ satisfies X ∞ ∈ {0, 1} with probability 1.
Proof: We first write the algorithm in its canonical form It is straightforward to check that the ODE ẋ = h(x) has two equilibrium points, 0 and 1, 1 being attractive with (0, 1] as an attracting interval and 0 is unstable. Since the martingale increments ∆M n are bounded, it follows from the assumptions on the sequence (γ n ) n≥1 and the Hoeffding condition (see Theorem 1(b)) that max −→ 0 as t → +∞ for every T > 0. On the other hand the function κ being bounded on [0, 1] and ρ n converging to 0, we have, for every T > 0, max Finally, the sequence (∆R n ) n≥1 satisfies Assumption (1).Consequently, either X n visits infinitely often an interval [ε, 1] for some ε > 0 and X n converges toward 1, or X n converges toward 0. ♦ Remark 1 If π = 0, i.e. p A = p B , the algorithm reduces to The number 1/2 is the unique equilibrium of the ODE ẋ = (1−p A )(1−2x), and the interval [0, 1] is a domain of attraction.Assuming ∞ n=1 ρ n γ n = +∞, and that the sequence (γ n /ρ n ) n≥1 is non-increasing and satisfies it can be proved, using the Kushner-Clark ODE Lemma (Theorem 1), that lim n→∞ X n = 1/2 almost surely.As concerns the asymptotics of the algorithm when π = 0 and γ n = g ρ n (for which the above condition is not satisfied), we refer to the final remark of the paper.
From now on, we will assume that p A > p B .The next proposition shows that the penalized algorithm is infallible under very light assumptions on γ n and ρ n .Proof: We have from (2), since h ≥ 0 on the interval [0, 1], Since the jumps ∆M j are bounded, we have Hence, it follows that, still on the set Therefore, we must have P(X ∞ = 0) = 0. ♦ The following Proposition will give a control on the conditional variance process of the martingale (M n ) n∈N which will be crucial to elucidate the rate of convergence of the algorithm.

Proposition 4
We have, for n ≥ 0, Then, the sequence Note that the assumptions of Theorem 2 are satisfied if γ n = C/n a and ρ n = C ′ /n r , with C, C ′ > 0, 0 < r < a and a + r < 1.In fact, we will see that for this choice of parameters, convergence holds with probability one (see Theorem 3).Before proving Theorem 2, we introduce the notation We have, from (2) Hence

It follows from the assumption ρ
We have and Moreover, with the notation Remark 2 Note that, as the proof will show, Lemma 1 remains valid if the condition lim n→∞ γ n /ρ n = 0 in ( 6) is replaced by the boundedness of the sequence (γ n /ρ n ) n≥1 .In particular, the last statement, which implies the tightness of the sequence (Y n ) n≥1 , will be used in Section 3.
On the other hand, for l ≤ n < ν l , we have By summing up these inequalities, we get (7) and (8).
By taking expectations in (7), we get We then have Proof: It suffices to show convergence to 0 in probability for the associated conditional variances T n , defined by We know from Proposition 4 that n , where and We first prove that lim n→∞ Therefore, and lim n→∞ T (2)  n = 0 follows from Cesaro's lemma.We now deal with T (1) so that, the sequence (γ n ) n≥1 being non-increasing with limit 0, we only need to prove that lim n→∞ T (1) n = 0 in probability, where Now, with the notation of Lemma 1, we have, for n ≥ l > 1 and ε > 0, Using Lemma 1, lim n→∞ γ n /ρ n = 0 and (9), we have We also know that lim l→∞ P(ν l < ∞) = 0. Hence, Hence, using lim Going back to ( 7) and ( 8) and using Lemma 2 with p = π + and π − , and the fact that and since π + and π − can be made arbitrarily close to π, the Theorem is proved.

Almost sure convergence
Theorem 3 In addition to (6), we assume that for all β ∈ [0, 1], and that, for some η > 0, we have Then, with probability 1, Note that the assumptions of Theorem 3 are satisfied if γ n = Cn −a and ρ n = C ′ n −r , with C, C ′ > 0, 0 < r < a and a + r < 1.The proof of Theorem 3 is based on the following lemma, which will be proved later.
Proof of Theorem 3: We start from the following form of (2): We know that lim n→∞ X n = 1 a.s.. Therefore, given π + and π − , with 0 < π − < π < π + < 1, there exists l ∈ N such that, for n ≥ l, By summing up these inequalities, we get, for n and We have, with probability 1, lim On the other hand, where we have used the condition We deduce from ( 14) and ( 15) that and, also, that lim It follows from Lemma 3, that given α ∈ [0, 1], we have, on the set Together with (12) and (13) this implies We obviously have P(E α ) = 1 for α = 1.We deduce from the previous argument that if P(E α ) = 1 and α−η let j be the largest integer such that α j > η (note that j exists because lim k→∞ α k < 0).We have ≤ 0.

♦
We now turn to the proof of Lemma 3 which is based on the following classical martingale inequality (see [13], remark 1, p.14 for a proof in the case of i.i.d.random variables: the extension to bounded martingale increments is straightforward).
Lemma 4 (Bernstein's inequality for bounded martingale increments) Let (Z i ) 1≤i≤n be a finite sequence of square integrable random variables, adapted to the filtration where σ 2 1 , . . ., σ 2 n , ∆ n are deterministic positive constants.Then, the following inequality holds: i .We will also need the following technical result.
Lemma 5 Let (θ n ) n≥1 be a sequence of positive numbers such that θ n = n k=1 (1 − pγ k ), for some p ∈ (0, 1) and let (ξ n ) n≥1 be a sequence of non-negative numbers satisfying We have Proof: First observe that the condition and that, given ε > 0, we have, for n large enough, where we have used the fact that the sequence (γ n ) is non-increasing.Since γ n ξ n ∼ γ n−1 ξ n−1 , we have, for n large enough, say n ≥ n 0 , Therefore, for n > n 0 , From this, we easily deduce that lim where, for the first equality, we have assumed ξ 0 = 0, and, for the last one, we have used again Note that {sup n ρ α n Y n < ∞} = µ>0 {ν µ = ∞}.On the set {ν µ = ∞}, we have We now apply Lemma 4 with Z i = γ i θ i 1 {i≤νµ} ∆M i .We have, using Proposition 4, where we have used the fact that, on {i ≤ ν µ }, , for some C µ > 0, depending only on µ.Using Lemma 5, we have On the other hand, we have, because the jumps ∆M i are bounded, and the sequence (γ n /θ n ) is non-increasing for n large enough.Therefore, we have sup where the positive constants C 1 , C 2 , C 3 and C 4 depend on λ 0 and µ, but not on n.
Using (11) and the Borel-Cantelli lemma, we conclude that, on {ν µ = ∞}, we have, for n large enough, , a.s., and, since λ 0 is arbitrary, this completes the proof of the Lemma.

♦ 3 Weak convergence of the normalized algorithm
Throughout this section, we assume (in addition to the initial conditions on the sequence where g is a positive constant.Note that a possible choice is γ n = ag/ √ n and ρ n = a/ √ n, with a > 0. Under these conditions, we have ρ n − ρ n−1 = o(γ 2 n ), and we can write, as in the beginning of Section 2, where lim n→∞ ε n = 0 and lim n→∞ π n = π.As observed in Remark 2, we know that, under the assumptions (16), the sequence (Y n ) n≥1 is tight.We will prove that it is convergent in distribution.
Theorem 4 Under conditions (16), the sequence (Y n = (1−X n )/ρ n ) n∈N converges weakly to the unique stationary distribution of the Markov process on [0, +∞) with generator L defined by for f continuously differentiable and compactly supported in [0, +∞).
The method for proving Theorem 4 is based on the classical functional approach to central limit theorems for stochastic algorithms (see Bouton [2], Kushner [10], Duflo [6]).The long time behavior of the sequence (Y n ) will be elucidated through the study of a sequence of continuous-time processes t ) t≥0 , which will be proved to converge weakly to the Markov process with generator L. We will show that ν has a unique stationary distribution, and that this is the weak limit of the sequence (Y n ) n∈N .
The sequence Y (n) is defined as follows.Given n ∈ N, and t ≥ 0, set where Theorem 5 Under the assumptions of Theorem 4, the sequence of continuous time processes (Y (n) ) n∈N converges weakly (in the sense of Skorokhod) to a Markov process with generator L.
The proof of Theorem 5 is done in two steps: in section 3.1, we prove tightness, in section 3.2, we characterize the limit by a martingale problem.

Tightness
It follows from ( 17) that the process Y (n) admits the following decomposition: with and The process (M (n) t ) t≥0 is a square integrable martingale with respect to the filtration (F , and we have We already know (see Remark 2) that the sequence (Y n ) n∈N is tight.Recall that in order for the sequence (M (n) ) to be tight, it is sufficient that the sequence (<M (n) >) is C-tight (see [7], Theorem 4.13, p. 358, chapter VI).Therefore, the tightness of the sequence (Y (n) ) in the sense of Skorokhod will follow from the following result.
For the proof of this proposition,we will need the following lemma.
Lemma 6 Define ν l as in Lemma 1, for l ∈ N.There exists a positive constant C such that, for all l, n, N ∈ N with l ≤ n ≤ N , we have Proof: The function κ being bounded on [0, 1], it follows from (17) that there exist positive, deterministic constants a and b such that, for all n ∈ N, We also know from Proposition 4 that From (21), we derive, for j ≥ n, We have, using Markov's inequality and Lemma 1, On the other hand, using Doob's inequality, ρ n = 0 and Lemma 1, we get, for some C > 0, and, since we have assumed λ ≥ 1, the proof of the lemma is completed.

♦
Proof of Proposition 5: Given s and t, with 0 ≤ s ≤ t, we have, using the boundedness of κ, for some a, b > 0. Similarly, using (22), we have for some a ′ , b ′ > 0. These inequalities express the fact that the processes B (n) and <M (n) > are strongly dominated (in the sense of [7], definition 3.34) by a linear combination of the processes X (n) and Z (n) , where X Therefore, we only need to prove that the sequences (X (n) ) and (Z (n) ) are C-tight.This is obvious for the sequence X (n) , which in fact converges to the deterministic process t.We now where we have used k=n+1 γ k and the monotony of the sequence (γ n ) n≥1 .
Therefore, for δ > 0, and n large enough so that γ n+1 ≤ δ, We have, from Lemma 6, We easily conclude from these estimates that, given T > 0, ε > 0 and η > 0, we have for n large enough and δ small enough, which proves the C-tightness of the sequence (Z (n) ).♦

Identification of the limit
Lemma 7 Let f be a C 1 function with compact support in [0, +∞).We have where the operator L is defined by and the sequence (Z n ) n∈N satisfies lim n→∞ Z n = 0 in probability.
Proof: From (17), we have where ζ n = 0 in probability.Going back to (3), we rewrite the martingale increment ∆M n+1 as follows: Hence, Note that, due to our assumptions on γ n and ρ n , we have, for some deterministic positive constant C, We will first show that with the notation Plim for a limit in probability.Denote by w the modulus of continuity of f ′ : We have, for some (random) θ ∈ (0, 1), where where we have used Ȳn+1 = Ỹn + ξ n+1 and (25).In order to get (26), it suffices to prove that lim n→∞ Observe that lim n→∞ Ŷn = 0 in probability (recall that lim n→∞ X n = 1 almost surely).Therefore, we have (26).

We deduce from E(∆
so that the proof will be completed when we have shown We have For the behavior of F n as n goes to infinity, we use and lim n→∞ ρ n /γ n+1 = 1/g, so that For the behavior of G n , we write, using lim with Plim n→∞ η n = 0, so that, using the fact that f is C 1 with compact support and the tightness of (Y n ), which completes the proof of (27).

♦
Proof of Theorem 5: As mentioned before, it follows from Proposition 5 that the sequence of processes (Y (n) ) is tight in the Skorokhod sense.On the other hand, it follows from Lemma 7 that, if f is a C 1 function with compact support in [0, +∞), we have where (M n ) is a martingale and (Z n ) is an adapted sequence satisfying P- . It is easy to verify that M (n) is a martingale with respect to F (n) .
We also have where Plim n→∞ R (n) t = 0.It follows that any weak limit of the sequence (Y (n) ) n∈N solves the martingale problem associated with L. From this, together with the study of the stationary distribution of L (see Section 3.3), we will deduce Theorem 4 and Theorem 5. ♦

The stationary distribution
Theorem 6 The Markov process (Y t ) t≥0 , on [0, +∞), with generator L has a unique stationary probability distribution ν.Moreover, ν has a density on [0, +∞), which vanishes on (0, r A ] (where r A = (1 − p A )/p A ), and is positive and continuous on the open interval (r A , +∞).The stationary distribution ν also satisfies the following property: for every compact set K in [0, +∞), and every bounded continuous function f , we have Before proving Theorem 6, we will show how Theorem 4 follows from (28).
Proof of Theorem 4: Fix t > 0. For n large enough, we have γ n ≤ t < n k=1 γ k , so that there exists n ∈ {1, . . ., n − 1} such that Since t is fixed, the condition n k=n+1 γ k ≤ t implies lim n→∞ n = ∞ and lim n→∞ t n = t.Now, given ε > 0, there is a compact set K such that for every weak limit µ of the sequence (Y n ) n∈N , µ(K c ) < ε.Using (28), we choose t such that Now take a weakly convergent subsequence (Y n k ) k∈N .By another subsequence extraction, we can assume that the sequence (Y (n k ) ) converges weakly to a process Y (∞) which satisfies the martingale problem associated with L. We then have, due to the quasi left continuity of Y (∞) , lim for every bounded continuous function f (keep in mind that the functional tightness of (M (n) ) follows from Theorem 1.13 in [7] which in turn relies on the so-called Aldous criterion; any weak limiting process of such a sequence in the Skorokhod sense is then quasi-left continuous and so is Y since B is pathwise continuous).Hence lim ). Observe that the law of Y (∞) 0 is a weak limit of the sequence Y n , so that It follows that any weak limit of the sequence (Y n ) n∈N is equal to ν, which completes the proof of Theorem 4.

♦
For the proof of Theorem 6, we first observe that the generator L depends in an affine way on the state variable y.This affine structure suggests that the Laplace transform E y e −pYt has the form e ϕp(t)+yψp(t) , for some functions ϕ p and ψ p .Affine models have been recently extensively studied in connection with interest rate modelling (see for instance [4] or [5]).The following proposition gives a precise description of the Laplace transform.
and the convergence is uniform on compact sets.This implies the uniqueness of the stationary distribution as well as (28).We also have the Laplace transform of ν: This yields ν([0, r A )) = 0. ß• Further properties of the invariant distribution ν.The stationary distribution satisfies Lf dν = 0 for any continuously differentiable function f with compact support in [0, +∞).This reads where r = p B /p A and r A = (1 − p A )/p A .We first show that ν({r A }) = 0. Let ϕ be a non-negative continuously differentiable function satisfying ϕ = 1 in a neighbourhood of the origin and ϕ = 0 outside the interval [−1, 1].For n ≥ 1 let f n (y) = ϕ(n(y − r A )), y ∈ R.
We have f n (y) = 0 if |y − r A | ≥ 1/n.In particular, the support of f n lies in [0, +∞), for n large enough.Applying (31) with f = f n , we get Observe that lim where we have used ν(−∞, r A ) = 0. On the other hand, we have Hence ν({r A }) = 0. We now study the measure ν on the open interval (r A , +∞).Denote by D the set of all infinitely differentiable functions with compact support in (r A , +∞).We deduce from (31) that, for f ∈ D, Denote by ν g the measure defined by ν g (dy)f (y) = ν(dy)f (y + g).We deduce from (32) that ν satisfies the following equation in the sense of distributions: Denote by F the function defined by where d = r r A /g.We have so that the equation satisfied by ν reads where the function G is defined by G(y) = − r g y−g y−r A .
On the set (r A , r A + g), the measure ν g vanishes, so that ν = λ 0 F for some non negative constant λ 0 .At this point, we know that the restriction of the measure ν to the set (0, r A + g) has a density which vanishes on (0, r A ) and is given by λ 0 F on (r A , r A + g).
We will prove by induction that the distribution ν coincides with a continuous function on (r A , r A + ng), which is infinitely differentiable on (r A + (n − 1)g, r A + ng).The claim has been proved for n = 1.Assume that it is true for n.On the set (r A , r A + (n + 1)g), the distributional derivative of (1/F )ν coincides with the function y → (G(y)/F (y))ν(y − g), which is locally integrable on (r A , r A + ng + g), continuous on (r A + g, r A + ng + g), and infinitely differentiable on (r A +ng, r A +ng +g), due to the induction hypothesis (there may be a discontinuity at r A +g if d < 1).It follows that (1/F )ν is a continuous (resp.infinitely differentiable) function, and so is ν on (r A , r A + (n + 1)g) (resp.(r A + ng, r A + ng + g)).We have proved that ν has a continuous density on (r A , +∞), which is infinitely differentiable on the open set ∞ n=1 (r A + (n − 1)g, r A + ng).Finally, we prove that the density of ν is positive on (r A , +∞).Note that G(y) < 0 if y > g and that the density vanishes at y − g if y < g.Therefore 1 F ν ′ ≤ 0, so that the function y → ν(y)/F (y) is nondecreasing.It follows that λ 0 cannot be zero (otherwise ν would be identically zero).Hence ν(y) > 0 for y ∈ (r A , r A + g).Now, if ν(y) > 0 for y ∈ (r A + ng − g, r A + ng), the function ν/F is strictly decreasing on (r A + ng, r A + ng + g) and, therefore, cannot vanish.So, by induction, the density is positive on (r A , +∞).This completes the proof of Theorem 6.

♦
Additional remarks.• The proof of Theorem 6 provides a bit more information on the invariant distribution ν.Let g > 0 and let φ g denote its continuous density on (r A , +∞): the function φ g is C ∞ on [r A , +∞) \ (r A + g N) and it follows from (34) and the definitions of r and r A (and d = rr A /g, see the proof of theorem 6) that φ g (r A ) = +∞ if g > g * , φ g (r A ) ∈ (0, +∞) if g = g * and φ g (r A ) = 0 if g < g * where g * = p B (1−p A ) p 2 A ∈ (0, ).As concerns the regularity of the density φ g at points y ∈ r A + g N, one easily derives from Equation (33) that for every m, k ∈ N, -φ g is C m+k at r A + kg as soon as g < g * m+1 , -the (m+k) th derivative φ Note that, as one could expect, this variance goes to 0 as g → 0. As a conclusion, we present in figure 1 three examples of shape for φ g .They were obtained from an exact simulation of the Markov process (Y t ) t≥0 (associated to the generator L) at its jump times: we approximated the p.d.f. by a histogram method using Birkhoff's ergodic Theorem.

Figures should be here
A final remark about the case π = 0 and γ n = g ρ n .In that setting (see Remark 1) the asymptotics of the algorithm cannot be elucidated by using the ODE approach since it holds in a weak sense.Setting Y n = 1 − 2X n one checks that Y n ∈ [−1, 1] and and that E((∆M n+1 ) ).Then, a similar approach as that developed in this section (but significantly less technical since (Y n ) is bounded by 1) shows that Y n converges in distribution to the invariant distribution µ of the Brownian diffusion with generator Lf (y) = −2g(1 − p A )yf ′ (y) + 1 2 g 2 p A (1 − y 2 )f ′′ (y).In that case, it is well-known that µ has a density function for which a closed form is available (see [8]), namely µ(dy) = m(y)dy with m(y) = C g,r A (1 − y 2 ) 2r A g −1 1 (−1,1) (y).
for some positive constant C. Therefore, since n γ n ρ n = ∞, This proves the Proposition.♦ 2 The rate of convergence: pointwise convergence 2.1 Convergence in probability Let u p (t, y) = exp(ϕ p (t) + yψ p (t)), where ψ p and ϕ p are defined as in the statement of the Proposition.The existence of ψ p follows from Lemma 8.An easy computation shows that ∂up ∂t − Lu p = 0 on [0, +∞) × [0, +∞), so that, for T > 0, the process (u p (T − t, Y t )) 0≤t≤T is a martingale, and E u p (T, Y 0 ) = E u p (0, Y T ), and the Proposition follows easily.