Quasi-stationary distribution and metastability for the stochastic Becker-D\"oring model

We study a stochastic version of the classical Becker-D\"oring model, a well-known kinetic model for cluster formation that predicts the existence of a long-lived metastable state before a thermodynamically unfavorable nucleation occurs, leading to a phase transition phenomena. This continuous-time Markov chain model has received little attention, compared to its deterministic differential equations counterpart. We show that the stochastic formulation leads to a precise and quantitative description of stochastic nucleation events thanks to an exponentially ergodic quasi-stationary distribution for the process conditionally on nucleation has not yet occurred.


Introduction
The Becker-Döring model is a kinetic model for phase transition phenomenon represented schematically by the reaction network  We assume an infinite reservoir of monomer, cluster of size 1, represented in (1.1) by ∅. The parameter z represents the fixed concentration of monomer and will play a key role in the sequel. A cluster of size i ≥ 2, whose population is represented in (1.1) by C i , lengthen to give rise to a cluster of size i + 1 at rate a i z or shorten to give rise a cluster of size i − 1 at rate b i . The rate of apparition of a new cluster of size 2 is a 1 z 2 (without loss of generality). All parameters are positives.
The Becker-Döring (BD) model goes back to the seminal work "Kinetic treatment of nucleation in supersaturated vapors" in [2]. Since then, the model met very different applications ranging from physics to biology. From the mathematical point of view, this model received much more attention in the deterministic literature than the probabilistic one. We refer to our review [6] for historical comments and detailed literature review on theoretical results from the deterministic side. See also [7,14] for recent results on functional law of large number and central limit theorem.
The model was initially designed to explain critical phase condensation phenomena where macroscopic droplets self-assemble and segregate from an initially supersaturated homogeneous mixture of particles, at a rate that is exponentially small in the excess of particles. This led to important applications in kinetic nucleation theory [13]. Mathematical studies in the 90's showed that (in the deterministic context), departing from certain initial conditions, the size distribution of clusters reaches quickly a metastable configuration composed of "small" clusters, and remains arbitrary close to that state for a very large time, before it converges to the true stationary solution that leads to "infinitely large" clusters (interpreted as droplets) [10,11].
Our objective in this note is to re-visit the metastability theory in Becker-Döring model in terms of quasi-stationary distribution (QSD) for the associated continuous-time Markov chain. We prove existence, uniqueness and exponential ergodicity of a QSD for the BD model conditioned on the event that large clusters have not yet appeared. We prove furthermore that the convergence rate towards the QSD is faster than the rate of apparition of (sufficiently) large clusters. Quantitative results are obtained thanks to a surprisingly simple analytical formula for the QSD, that provides also an exact rate of apparition of stable large clusters, consistently with the original heuristic development of Becker and Döring. Outline: We introduce in Sec. 2 the stochastic BD model. In Sec. 3, we gather few (known) results on a related Birth-Death process, and from deterministic modeling of the BD model. Then, we prove exponential decay, in total variation, of the BD process towards its stationary measure in Sec. 4. For any n ≥ 2, similar results are obtained in Sec. 5 for the process conditionally on no cluster larger than n are formed. An estimate on the time for the first cluster larger than n to appear is provided in Sec. 6. The last, but not least, Sec. 7 gathers our results in a quantitative way, and proves that the QSD can be interpreted as a long-lived metastable state, when n equals the critical nucleus size.

Notation:
We denote by N 0 the set of non-negatives integers N 0 = {0, 1, 2 . . .}, N i the set of integers greater or equal to i, [[ ]] for integers interval. For a set A, A c is its complement, 1 A the indicator function on it, #A its cardinality. For two sets A and B their intersection is denoted by A, B. Also, 1 x = 1 {x} and 1 (resp. 0) is the constant function equal to 1 (resp. 0). For two numbers a, b, their minimum is a ∧ b. We denote by µ − ν the total variation distance between two probability measure µ and ν on a countable state space S namely where Γ is the set of probability measures on S × S with marginals µ and ν. E (resp. E µ ) denotes the expectation with respect to the usual probability measure P (resp. µ). Below, we set E = ℓ 1 (N 2 , N 0 ) the space of summable N 0 -valued sequences indexed by N 2 .

The model
The stochastic Becker-Döring (BD) process is a continuous-time Markov chain on the countable state space E with infinitesimal generator A, given for all ψ with finite support on E and C ∈ E, by with the convention C 1 = z, ∆ 1 = e 2 and ∆ i = e i+1 −e i , for each i ≥ 2, where {e 2 , e 3 , . . .} denotes the canonical basis of E namely, e i,k = 1 if k = i and 0 otherwise. We shall however use a different approach, modeling explicitly the size of each individual cluster. On a sufficiently large probability space (Ω, F , P), we introduce: • N 1 , N 2 , . . . a denumerable family of independent Poisson point measure with intensity the Lebesgue measure dsdu on R 2 + .
• T 1 , T 2 , . . . a collection of random times such that the T k − T k−1 are independent exponential random variable of parameter a 1 z 2 , independent from the above Poisson point measure as well, with T 0 = 0. Let Π in a probability distribution on E and C(0) = (C 2 (0), C 3 (0), . . .) an E-valued random variable distributed according to Π in . We denote by N in = ∞ i=2 C i (0). By construction N in < ∞ almost surely (a.s.). Then, given C(0), we define X 1 (0), X 2 (0), . . . a denumerable collection of random variables on N 2 such that, a.s. for each i ≥ 2, (2.1) and X in k = 2 for all k > N in . Note this construction may be achieved by a bijective labeling function 1 . Finally, we consider the denumerable collection of stochastic processes X 1 , X 2 , . . . on N 1 solution of the stochastic differential equations, for all t ≥ 0 and k ≥ 1, where by convention T k = 0 if k ≤ 0. The pathwise construction (2. 2) is what we call thereafter the particle description of the BD process. The interpretation is clear: X k (t) denotes the size of the cluster labelled by k at time t; for k ≤ N in , clusters are initially "actives" while for k > N in clusters are initially "inactives" at state 2, and become "activated" at the random arrival times T k−N in . We ensured X k (0) < ∞ a.s. because the C i (0)'s are integer-valued random variables and belong to E, the sequence C(0) is a.s. equally 0 from a certain range. Thus, local existence of càdlàg processes t → X k (t) on N 1 solution to (2.2) can classically be obtained inductively. It is clear from (2.2) that each X k evolves like a Birth-Death process for t > T k (that will be detailed in the next section 3) and are mutually independent conditionally to their initial value. The Reuter's criterion gives a well-known necessary and sufficient condition so that each process X k is non-explosive, namely It is now convenient to go back to the original description at stake. The number of "active" clusters at time t ≥ 0 is given by the counting process , we can prove from standard stochastic calculus that the process C given by C(t) = (C 2 (t), C 3 (t), . . .) for all t ≥ 0 has infinitesimal generator A, and being non-explosive under condition (H0), it is the unique regular jump homogeneous Markov chain on E with infinitesimal generator A and initial distribution Π in , say the BD process. The proof is left to the reader and follows from classical theory, e.g. [1]. In the sequel, C(t) always denote a BD process, and P Π in {C ∈ ·} its (unique) finite dimensional probability distribution given that C(0) is distributed according to Π in . We also set by convention P C = P δC for a deterministic C ∈ E and we recover P Π in {·} = C∈E P C {·} Π in (C).

Behaviour of one cluster
Let X the continuous-time Markov chain on N 1 with transition rate matrix (q i,j ) i,j≥1 whose nonzero entries are (3.1) Remark that i = 1 is absorbing in agreement with (1.1): when a cluster size reaches 1, it "leaves the system". We shall assume standard hypotheses in the BD model [10,11]: Hypothesis (H1) then guarantee (H0) for z = z s for the following reason. The conver- depends on the value of z. Indeed, z s is the radius of convergence of the first series while the second series converges for z > z s and diverges for z < z s . We have a dichotomy in the long time behavior of X related to this value. The case z < z s is called the sub-critical case, for which absorption at state 1 is certain and the expected time of absorption is finite (also called ergodic absorption). The case z > z s is called the super-critical case and absorption at 1 is not certain (also called transient absorption), and the probability to be absorbed at 1 is, according to [8], where p ij (t) = P {X(t) = j | X(0) = i} the probability transition function of X. The limit case z = z s is somewhat technical and depends more deeply on the shape of the coefficients. It is not considered in this note.
Following [10], a precise long time estimate on transient states can be obtained, under the hypothesis (H1) and In such a case, the infinite matrix (q i,j ) i,j≥2 in (3.1) is self-adjoint on the Hilbert space H consisting of the real sequences x = (x 2 , x 3 , . . .) whose norm is We denote by ·, · H the associated scalar product. It turns that (q i,j ) i,j≥2 has a negative maximum eigenvalue −λ, and the following estimate holds for any i ≥ 2, We will also consider the chain X conditioned to remain below a given state n + 1 ≥ 2.
We define the exit time Hence, Y is absorbed either in 1 or n + 1, and the probability to be absorbed at 1 (without visiting state n + 1) is, Again, in [10], the author shows that the truncated matrix (q i,j ) i,j=2,...,n is similar to a symmetric one and then there exists γ n > 0 such that for each i = 2, . . . , n, Note the probability to be absorbed in 1 before time t, p n i1 (t), is monotonously increasing where the constant M i,n , obtained thanks to (3.7), Cauchy-Schwarz inequality and (3.5), is given by We end this preliminary section, noticing that

Long-time behaviour of the BD process
In this section we are concerned with the long-time behaviour of the BD process. Formally the measure Π eq , given by for all C ∈ E, satisfies E Π eq [Aψ(C)] = 0 for any function ψ on E with finite support 2 . Actually, Π eq satisfies the detailed balance condition a i zC i Π eq (C) = b i+1 (C i+1 + 1)Π eq (C + ∆ i ), for all i ≥ 1 and all C ∈ E (with the convention that C 1 = z), as a consequence of the relation a i Q i = b i+1 Q i+1 . In the sub-critical case, Π eq is a probability measure on i=2 c eq i < ∞) and we prove exponential ergodicity towards Π eq . In the super-critical case, Π eq is not a limiting distribution (and ∞ i=2 c eq i = ∞) but the measure defined by 3), we have: • In the sub-critical case (z < z s ), for all t ≥ 0, • In the super-critical case (z > z s ), for all t ≥ 0, and for all n ≥ 2, Not least, remark that E Π eq C, Q i z i < ∞ for z < z s and that f ∈ H for z > z s (see [10]). In the sequel, K n and K always refer to the constants given above. The proof is based on a coupling (described below) to a distribution starting from 0, so that the control of the initial particles in C(0) are a key point.

Lemma 4.2.
Under the hypothesis of Theorem 4.1. Let the collection of processes X 1 , X 2 , . . . being a particle description of the BD process C. We have, for each n ≥ 2, In particular, for the sub-critical case, given by the labelling function, e.g. C(0) = C and X 1 (0) = i 1 , . . . X N (0) = i N satisfy relation (2.1). Since each processes X 1 , . . . X N are independent copy of the chain X given in Sec. 3, conditionally on their initial condition, we have x i for any non-negatives x 1 , . . . , x N . Finally, we conclude that and the proof ends.
We now show, by a coupling argument, that any solution satisfying (4.2) is in total variation exponentially close to the solution that starts with no cluster, namely the deterministic initial condition at 0.
In the sub-critical case the proof readily follows from Lemma 4.2 and the definition of the total variation since because all "active" clusters are equal whenever all initial clusters from Π in have been absorbed. A very similar argument holds in the super-critical case.
Proof of Theorem 4.1. We consider first the sub-critical case. As said, condition (4.2) is satisfied for Π eq since E Π eq C, √ Q H < ∞, thus Lemma 4.3 applies for Π eq as initial distribution. Because the constructed BD process is regular and Π eq is a stationary distribution (i.e. P Π eq {C(t) ∈ ·} = Π eq ), we deduce Going back to any Π in satisfying condition (4.2), applying Lemma 4.3 again and the triangular inequality yield the desired result.
Consider now the super-critical case. Π stat is a product of Poisson distribution P(f i ) on N 0 with mean f i . According to a classical result on Markov population processes, see e.g. [9,Sec. 4], the law P 0 {C(t) ∈ ·} is also a product of Poisson distribution P(c i (t)) on N 0 with mean c i (t) such that c(t) = (c 2 (t), c 3 (t), . . .) solves the deterministic (linear) Becker-Döring equations namely, c(t) = Ac(t) + a 1 z 2 e 2 where A = (q j,i ) i,j≥2 the matrix with entries in (3.1), and initial condition c(0) = 0. Thanks to [10, Theorem III], we have (4.4) and according to [12,Lemma 1], P(c i (t))−P(f i ) ≤ |c i (t)−f i |. The latter, with independence of the marginals of Π stat and of P 0 {C(t) ∈ ·}, estimate (4.4) and Cauchy-Schwarz inequality, entail We conclude again by Lemma 4.3 and the triangular inequality.

A quasi-stationary distribution
Remark that P Π in {τ n > t} > 0 for all times t > 0 and for any Π in supported on E n . We will give in the next section 6 a tight lower bound on that probability in the super-critical case. In this section, we prove exponential ergodicity towards a unique QSD for the BD process conditioned to τ n > t. It is remarkable that we have at hand an explicit QSD, given, for all C ∈ E n , by Proof. Recall assumption (H0) ensures the BD process is regular. Fix n ≥ 2. Let the semi-group P n t ψ(C) = E C [ψ(C(t))1 t<τn ] for t ≥ 0 (i.e. C(0) is distributed according to δ C ), whose generator is for all C ∈ E n (recall C 1 = z) and bounded function ψ on E n . Denote by A * n the dual operator for the generator A n . Some calculations show that the distribution (5.2) satisfies, for any C ∈ E n , with the convention f n 1 = z. Since the f n i given by (5.2) verifies a i zf n i −1 i<n b i+1 f n i+1 = J n for all i ∈ [ [1, n]], all terms but the first cancel in the above expression, so that we obtain A * n Π qsd n = −J n Π qsd n which is the classical spectral criteria of QSD, noticing that J n ≤ a 1 z 2 , see [4,Thm 4.4].
The next theorem shows the QSD is a quasi-limiting distribution for a wide range of initial distribution supported on E n , with an exponential rate of convergence and an explicit (non-uniform) pre-factor.

Theorem 5.2.
Under assumption (H0). Let Π in a probability distribution on E n such that We have for all t ≥ 0, where τ n is defined in (5.1), J n in (3.5), K n in Theorem (4.1), γ n in (3.7), Let the collection of processes X 1 , X 2 , . . . being a particle description of the BD process C. We have Proof. We start by observing that the following relation holds true, τ n = min(τ 0 n , T 1 n , . . . , T N in n ) , where T k n = inf {t > 0 | X k (t) ≥ n + 1} for k = 1, . . . , N in and τ 0 n = inf t ≥ 0 ∃k > N in , X k (t) ≥ n + 1 .
Conditionally on their initial condition, all clusters X k (t) (starting at i k ) are independent from each other, thus the is independent of (X k ) k>N and thus independent of τ 0 n . Then, Still by independence of the clusters from each other, we claim that where X is defined in Sec. 3 and T n in (3.4). This equation is clear for N = 1, and is easily proved by induction. We do it only for N = 2, for the sake of simplicity. Let i 1 , i 2 ∈ [ [2, n]]. By definition of A t , since X 1 and X 2 are independent copy of X, which proves the desired result. Thus, going back to (5.6) and thanks to (3.8) we have Finally, we obtain where in the second line, N and (i k ) k=1..N are given by the labelling function for each C ∈ E n , and using that P C {τ n > t} ≤ 1 in the last inequality. Remark the expression of H in n is obtained thanks to the definition of M i,n in (3.8) and f n i in (5.2).
Proof of the Theorem 5.2. Let the collection of processes X 1 , X 2 , . . . (resp. Y 1 , Y 2 , . . .) being a particle description of the BD process that starts from the initial distribution Π in (resp. from δ 0 ). We couple the processes X 1 , X 2 , . . . to the processes Y 1 , Y 2 , . . . as in the proof of Theorem 5.2, namely, Y k (t) = X k+N in (t) for all k ≥ 1 and all t ≥ 0, where N in is distributed according to Π in . To avoid notation confusion, we write τ X n and τ Y n the first exit time from E n for the collection of processes {X k } and {Y k }, respectively. We define τ 0,X n and τ 0,Y n , respectively to the processes {X k } and {Y k } likewise τ 0 n in (5.4). Due to the coupling between the {X k } and {Y k }, we have Remark that each Y k , for k ≥ 1, is independent of X i for i ≤ N in , thus independent of T 1 n , . . . , T N in n the exit times arising in (5.3). Hence, by (5.3) and (5.7), the laws of the collection of processes {Y k } conditioned to τ Y n > t equals to the laws of the collection of processes {Y k } conditioned to τ X n > t. Also, we have, for any i ≥ 2 and t > 0, # {k | X k (t) = i} = # {k | Y k (t) = i} on the event A t , given in (5.5), since all initial particles being absorbed. Finally, we deduce that The latter, with Lemma 5.3, entails We end the proof by triangular inequality.

Estimates on τ n and the largest cluster
In this section we consider the super-critical case z > z s . The analysis of the τ n in (5.1) leads off the simple observation where X 1 , X 2 , . . . is the particle description of the BD process. We prove Theorem 6.1. Under assumption (H0). Let z > z s and Π in a probability distribution on In fact, as in the coupling strategy, (5.3) provides a useful understanding of the statistics of τ n by decomposing between the initial cluster from the ones that will appear at later times. We start with the statistics of the later, namely of τ 0 n . The next lemma clearly applies for the initial distribution Π in = δ 0 but is more general, and related to a maximum principle in the BD model [3].

Lemma 6.2.
Under assumption (H0) and z > z s . For any probability distribution Π in on E n such that for all i ∈ [[2, n]] and k ∈ N, Proof of Lemma 6.2. It is classical that condition (6.1) ensures there exists randoms C in and C qsd distributed according to Π in and Π qsd , respectively, such that for each , a.s. see e.g. [5,Sec. 4.12]. Then, we may construct the collection of processes X 1 , X 2 , . . . (resp. Y 1 , Y 2 , . . .) as a particle description of the BD process associated to C in (resp. to C qsd ) such that, a.s., for all i ≥ 1, X i (0) ≤ Y i (0). A standard coupling between two copies of the chain X from Sec. 3 consists in having the same jumps in the two copies as soon as they are equal. Such coupling applied to each couple (X i , Y i ) then ensures that, for all i ≥ 1 and t ≥ 0, we have X i (t) ≤ Y i (t) a.s. In particular,  Let ψ n (t, x) = 1 if max 0≤τ ≤t x(τ ) ≤ n and 0 otherwise. We have where X 1 , X 2 , . . . the particle description of the BD process C (starting at δ C ). By independence of the particles conditionally to their initial condition, where again, N and (i k ) k=1..N are given by the labelling function associated to C ∈ E n . Combining relations (6.2) and (6.3) with Lemma 6.2 and summing over all initial conditions ends the proof.

Metastability close to z s
In this section we assume additionally to (H1) and (H2), to fit with [10,11], that which is arbitrary close to one for times t ≪ 1/J n * as f n * i /(Q i z i ) → 1 when z z s .
Moreover, we have P Π in {C(t) ∈ · | τ n * > t} − Π qsd n ≤ ( H in j G in j + H qsd n * )K n * e J n * t−γ n * t (7.2) which is arbitrary small for times 1/γ n * ≪ t ≪ 1/J n * . Indeed, note that a i Q i z i is decreasing up to the size n * , thus K 2 n * ≤ a1z A ′ n * . Hence, K n * as well as H qsd n * are at most algebraically large. Equations (7.1)-(7.2) show the QSD is indeed a metastable state.