Propagation of chaos for a General Balls into Bins dynamics

Consider $N$ balls initially placed in $L$ bins. At each time step take a ball from each non-empty bin and \emph{randomly} reassign the balls into the bins.We call this finite Markov chain \emph{General Repeated Balls into Bins} process. It is a discrete time interacting particles system with parallel updates. Assuming a \emph{quantitative} chaotic condition on the reassignment rule we prove a \emph{quantitative} propagation of chaos for this model. We furthermore study some equilibrium properties of the limiting nonlinear process.


Introduction
Consider N balls initially placed in L bins. We take a ball from each non-empty bin and we randomly reassign the balls into the bins. We iterate independently this procedure at each time step. The random evolution of the number of balls in each bin is an ergodic finite state Markov chain that we call General Repeated Balls-into-Bins (GRBB) process. A particular case of the GRBB process is the Repeated Balls-into-Bins (RBB) process studied in [2] and [5] where the balls are uniformly and independently reassigned into bins. In the GRBB process the random reassignment has a general distribution. The systems in this class are conservative interacting particles systems, in discrete time, with parallel updates. Usually the GRBB process is not reversible and its invariant measure can be difficult to compute.
The GRBB process, as the RBB process, appears naturally in different applicative contexts. For example we can think to balls in every bin as customers in a queue. Customers are served at discrete times and each served customer is reassigned to a random queue. In this setting the GRBB process is a discrete time closed Jackson network [6,8]. The parallel updating is justified (see for example [2]) by thinking to customers as tasks (or tokens) in a network of parallel CPU which are reassigned at every round.
In this paper we are interested in the behavior of the GRBB process for large L. In [5] we studied this problem for the RBB process. We proved that starting from an initial distribution symmetric and such that, as L → +∞, the number of balls in each bin becomes stochastically independent, these properties are preserved for any finite time. This phenomenon is called propagation of chaos [7,11]. The limiting evolution of the system is described by a nonlinear Markov process. The interesting fact, citing [11], is that "the study of every individual gives information on the behavior of the group". The price to pay for this simplification is that the limiting process evolves accordingly to a nonlinear equation. In the present paper assuming a quantitative chaotic condition on the reassignment rule, see Condition 2.3, we prove a quantitative propagation of chaos. Quantitative here means that we give the explicit rate of convergence of the empirical measure of the GRBB process to the nonlinear process distribution as L → +∞, see Theorem 2.5. The quantitative chaotic condition on the reassignment rule is strong but natural as can be seen as the distance between a canonical and grand canonical measures.
Propagation of chaos is a largely studied topic in literature, see for example [11] and references therein for an introduction. In general results on propagation of chaos includes diffusions with jumps. Recently the study of neural networks motivated the introduction of models with simultaneous jumps, see for example [1] and references therein. As parallel jumps may interfere with asymptotic independence propagation of chaos for these models is an interesting field. In particular in [1] the authors prove propagation of chaos for a wide class of models with simultaneous jumps not including the GRBB process.
The paper is organized as follows. In Section 2 we define the GRBB process and the nonlinear process, we then prove the theorem on quantitative propagation of chaos. In Section 3 we apply, using couplings and Pólya urn techniques, Theorem 2.5 to three cases of the GRBB process depending on different choices of the reassignment rule: Fermi-Dirac, Maxwell-Boltzmann and Bose-Einstein statistics.
In the last section we study the long time behavior of the nonlinear process.

Construction and main result
We denote with Z + the set of the non-negative integers and define N := Z + \ {0}. For any denumerable set S we denote with |S| its cardinality. Furthermore we denote with P(S) the metric space of probability measures on S endowed with the total variation distance We define the empirical measure function ρ L : Z L + → P(Z + ) as 2 and the map w L : Z L + → {0, 1} L , as w L (ξ) := (1(ξ 1 > 0), . . . , 1(ξ L > 0)).
To keep notation simple in the following we denote with the term constant a positive number which does not depend on L. Furthermore, when this does not cause confusion, we use the same symbol to denote different constants.

Propagation of chaos of the GRBB process
We here define the GRBB process (η L (t)) t≥0 and its corresponding nonlinear process (η(t)) t≥0 .

Definition 2.1
The GRBB process (η L (t)) t≥0 is the Markov chain with values in Z L + defined as follows. Assume that, for some t ≥ 0, η L (t) = ξ ∈ Z L + and q := ρ L (ξ) then It is independent from everything, independently generated at each time step t and satisfies Last equation assures the conservation of the number of particles for the GRBB process.
Definition 2.2 Given a measurable map ψ : P(Z + ) → P(Z + ) we define the ψ-nonlinear process (η(t)) t≥0 as the random process with values in Z + defined as follows. For some t ≥ 0 let q ∈ P(Z + ) be the distribution of η(t) and assume η(t) = η ∈ Z then B q is a random variable with values in Z + . It has distribution ψ(q), independent from everything and is independently generated at each time step t.
We want to provide a quantitative estimate on the rate of convergence of the empirical measure of the GRBB process to the distribution of a corresponding nonlinear process. The following is a sufficient condition to this aim.

Condition 2.3
The distribution of B L,q is symmetric. For any q ∈ P(Z + ) denote with ν q L ∈ P(Z 2 + ) the distribution of (B L,q 1 , B L,q 2 ) and assume that there exists µ q ∈ P(Z + ) and a constant C such that sup In the context of Gibbs measures the above condition is rater natural. It can be interpreted as an equivalence of ensambles estimate (see for example [4]) because it states that the distance between the two sites marginal of the canonical and grand canonical ensamble decreases as the inverse of the volume L.
Among the ψ-nonlinear processes we need to choose the one that gives the limiting evolution of the GRBB process. This is done in the following definition.
Definition 2.4 Given a GRBB process, such that the random vector B L,q satisfies Condition 2.3, the corresponding nonlinear process is the ψ-nonlinear process with ψ(q) := µ q .
We can now state our main result on propagation of chaos.
Theorem 2.5 Let (η L (t)) t≥0 be a GRBB process with symmetric initial distribution and such that sup L E(η L 1 (0)) < +∞. Assume that Condition 2.3 holds. Let (η(t)) t≥0 be the corresponding nonlinear process and assume that ψ is Lipschitz. Define Q L (t) := ρ L (η L (t)) and let Q(t) be the distribution of η(t). If there exists a constant C such that Proof. First of all observe that P sup Thus it is enough to show that for any t ∈ [0, T ] there exists a constant C such that We prove (2.4) by induction on t. By hypothesis (2.4) is true for t = 0. We assume it holds for some t ≥ 0 and prove it for t + 1.
Adding and subtracting terms we have that Thus for any δ > 0, We bound separately the three terms on the right hand side of (2.6) with bounds smaller that C/ √ L, for a suitable constant C. The last one can be bounded by observing that by Lipschitz condition on F we have and using inductive hypothesis (2.4).
To bound the second term of (2.6) we observe that the empirical process (Q L (t)) t≥0 is a Markov chain with values in P(Z + ). Its evolution can be described in the following way. Assume that Q L (t) = q and define for k ∈ N the discrete intervals where we define [a, b] = ∅ when a > b. Then By equation (2.5) and the definition of Λ q k (2.7) we have that Thus using (2.8), conditionally to Q L (t) = q, Due to Condition 2.3 the last line is bounded by C/L. Markov inequality gives We now bound the first term of (2.6). Define for any n ∈ Z + For the second term in (2.10) we observe that thus by Markov inequality We observe that because, by (2.2), the number of particles of the system is preserved. Thus by the symmetry of the distribution of η L (0) and the assumed uniform bound on for a suitable constant C ′ . For the first term in (2.10), using Markov and Cauchy-Schwarz inequalities we have

Which implies
Cov(Z L,q xk , Z L,q yh ). (2.14) In the sequel of this proof to keep notation simple we write The variance term in (2.14) can be bounded, using the symmetry of the distribution of B L,q , by The covariance term in (2.14) can be bounded, using the symmetry of the distribution of B L,q by Using the bounds (2.15) and (2.16) in (2.14), summing on x and y and changing the variables n + 1 − k → k and n + 1 − h → h we arrive to So, exchanging the sums to obtain the second inequality below, we get thus by (2.17) and Condition 2.3 there exists a constant C such that By (2.12) and (2.13) there is a constant C Plugging this bound and the bound il (2.11) in (2.10) and optimizing onn we arrive to for some constant C.

Classical occupancy models
In this section we consider the GRBB process when B L,q is distributed according to the Fermi-Dirac, Maxwell-Boltzmann and Bose-Einstein statistics. We will show that the hypothesis of Theorem 2.5 holds for these three classical occupancy models. The distribution µ q of Condition 2.3 is the natural limit point of the one site marginal of B L,q . In the last two cases the proof that Condition 2.3 holds uses coupling and Pólya urns arguments.

Fermi-Dirac statistics
We say that the random vector X = (X 1 , . . . , X L ) follows the Fermi-Dirac statistics with L sites and N ≤ L particles if Given q ∈ P(Z + ), let µ q ∈ P(Z + ) be the Bernoulli distribution with parameter 1 − q({0}) and assume that B L,q follows the Fermi-Dirac statistics with L sites and (1 − q({0}))L particles. The map ψ(q) := µ q is, in this case, 1-Lipschitz. To prove that Condition 2.3 holds we need the following result.
Theorem 3.1 Assume that X follows the Fermi-Dirac statistics with L sites and N particles. Denote with γ N L ∈ P(Z 2 + ) the distribution of (X 1 , X 2 ) and let λ N/L ∈ P(Z + ) be the Bernoulli distribution with parameter N/L. Then, Observing that and, because X 1 + · · · + X L = N, the result follows.
We observe that in the present case the GRBB process (η L (t)) t≥0 started with N ≤ L particles is ergodic and reversible with invariant measure given by the Fermi-Dirac statistics with L sites and N particles. Thus in this case propagation of chaos holds also at equilibrium. If N > L the GRBB process is not irreducible as there are blocked configurations. For completeness below we give an upper bound for the mixing time for the GRBB process.
Proposition 3.2 Let (η L (t)) t≥0 be the GRBB process defined in Section 3.1. Assume that: Then (η L (t)) t≥0 is ergodic its invariant is the Fermi-Dirac distribution with L sites and N particles. Furthermore, for L large enough, Proof. Define the decreasing sequence of events A 1 ⊇ A 2 ⊇ · · · ⊇ A N , where and the increasing sequence 0 := τ 1 ≤ · · · ≤ τ N of hitting times We notice that τ N is the first time such that in every site there is at most one particle and that at time t = τ N + 1 the system is distributed with its stationary measure independently from its state at time t = τ N . So τ N + 1 is a strong stationary time and we can use Proposition 6.11 of [9] to get t mix ≤ inf t ≥ 0 : P(τ N + 1 > t) ≤ 1/4 . By Markov inequality we have t mix ≤ 4 E(τ N +1)+1. To bound E(τ N ) we introduce, for n = 1, . . . , N − 1, the random variables σ n := τ n+1 − τ n so that Now observe that σ n > t if and only if η L (τ n + t) ∈ A n \ A n+1 . By strong Markov property

3)
A configuration in A n \ A n+1 is a configuration where there are n mobile particles, N − n blocked particles and L − n empty sites. So, if η L (0) = η ∈ A n \ A n+1 , we have that η L (1) ∈ A n \A n+1 if and only if the process puts a mobile particle on every site occupied by a blocked particle. Let p k,m be the probability that, following the Fermi-Dirac statistics with L sites and m ≤ L particles, the sites {1, . . . , k} are occupied. Thus ifk ∈ {1, . . . , N − n} is the number of sites occupied by the blocked particles in the configuration η, then P(η L (1) ∈ A n \ A n+1 |η L (0) = η) = pk ,n ≤ p 1,n = n/L.

Plugging this bound into (3.3) we get
So, by summing the geometric series, E(σ n ) ≤ L/(L − n) and by (3.2) and the result follows.

Maxwell-Boltzmann statistics
We say that the random vector X = (X 1 , . . . , X L ) follows the Maxwell-Boltzmann statistics with L sites and particles N if Given q ∈ P(Z + ), let µ q ∈ P(Z + ) be the Poisson distribution with parameter 1−q({0}) and assume that B L,q follows the Maxwell-Boltzmann statistics with L sites and (1 − q({0}))L particles. In this case the GRBB process is the RBB process studied in [5] and [2]. To apply Theorem 2.5 we have to show that Condition 2.3 holds and that the map ψ is Lipschitz for this model. To check Lipschitz property we take q, q ′ ∈ P(Z + ) and q 0 := q({0}),

Condition 2.3 is implied by the following result.
Theorem 3.3 Assume that X follows the Maxwell-Boltzmann statistics with L sites and N particles. Denote with γ N L ∈ P(Z 2 + ) the distribution of (X 1 , X 2 ) and let λ N/L ∈ P(Z + ) be the Poisson distribution with parameter N/L. Then, Proof. Letγ N L := P(X 1 ∈ ·) be the one site marginal of γ N L .
where, in the last line, we used the fact that for arbitrary probability measures γ, µ and ν: We bound separately the 2 terms in equation (3.4).
The second term is bounded using Poisson approximation of binomial distribution (see for example [10] §12) so that (3.5) To bound the first term in (3.4) we construct a coupling, namely we define (X 1 , X 2 ) ∼ γ N L and (Y 1 , Y 2 ) ∼γ N L ⊗γ N L . We take X 1 as a binomial random variable with parameters N and 1/L, i.e. X 1 ∼γ N L and consider U 1 , . . . U N i.i.d. uniformly distributed random variables with values in [0, 1] independent from X 1 . Define next where here and in the sequel we use the convention that 0 k=1 := 0. An elementary computation shows that (X 1 , X 2 ) has the distribution of the two components of a random vector following the Maxwell-Boltzmann statistics with N particles an L sites, i.e. (X 1 , X 2 ) ∼ γ N L . Define Y 1 := X 1 and Clearly Y 1 and Y 2 are i.i.d. random variables with common binomial distribution with parameters N and 1/L, i.e. (Y 1 , Y 2 ) ∼γ N L ⊗γ N L . Thus, defining the event We bound separately the two terms on the right hand side of equation (3.6). For the first one, we observe that the only way to have that A occurs and X 2 = Y 2 is that Thus, by the independence of X 1 , U 1 , . . . , U N : Using the binomial distribution of X 1 , an explicit computation shows that For the second term in equation (3.6) using again the independence of X 1 , U 1 , . . . , U N we can write

Bose-Einstein statistics
We say that the random vector X = (X 1 , . . . , X L ) follows the Bose-Einstein statistics with L ∈ N sites and particles N ∈ N if Given q ∈ P(Z + ), let µ q ∈ P(Z + ) be the geometric distribution supported on Z + and parameter 1/(2 − q({0})) and assume that B L,q follows the Bose-Einstein statistics with L sites and (1−q({0}))L particles. To apply Theorem 2.5 we have to show that Condition 2.3 holds and the map ψ is Lipschitz for this model. To check Lipschitz property proceeding as in Section 3.2 we get for any q, q ′ ∈ P(Z + )

Condition 2.3 is implied by the following result.
Theorem 3.4 Assume that X follows the Bose-Einsten statistics with L sites and N particles. Denote with γ N L ∈ P(Z 2 + ) the distribution of (X 1 , X 2 ) and let λ N/L ∈ P(Z + ) be the geometric distribution with support Z + and parameter 1/(1 + N/L): Proof. Letγ N L := P(X 1 ∈ ·) be the one site marginal of γ N L . As in (3.4) we get We bound separately the two terms in the right hand side of inequality (3.9). For the second one we use Theorem 3 of [12] to get To bound the first term in we construct a coupling. We define (X 1 , X 2 ) ∼ γ N L and (Y 1 , Y 2 ) ∼ γ N L ⊗γ N L . We generate X 1 as the 1 site marginal of X, namely X 1 ∼γ N L and define Y 1 := X 1 . Given X 1 = n, to generate X 2 and Y 2 we consider two urns named urn A and urn B. Initially in urn A there are L − 1 balls numbered from 2 to L, while in urn B there are L balls numbered from 1 to L. We distinguish two cases, case n < N and case n = N.
Case n < N. We draw a ball from urn A, assume that it is ball k, then we try to extract the same ball from urn B. To this end, independently, we generate a Bernoulli random variable with success probability (L − 1)/L. In case of success we extract the ball k from urn B; in case of failure we extract the ball 1 from urn B. The extracted balls are then returned in their urns with a ball with the same number. We call this replacement rule double replacement.
The next extractions are defined inductively. Assume that t extractions have been made with 0 < t < N − n. For k ∈ {2, . . . , L} let t k be the number of balls k drawn from urn A in the t extractions and f k , the number of times that the attempt to extract the same ball k from urn B failed in the t extractions. For the t + 1 extraction we draw a ball from urn A and make a test by generating a Bernoulli random variable with success probability In case of success we extract a ball k from urn B; in case of failure we extract a ball 1 from urn B. We then use double replacement. We iterate the preceding rule until t = N − n. Next we go on extracting a ball from urn B with double replacement for other n steps. Define X 2 as the number of times that ball 2 has been drawn from urn A and Y 2 as the total number of times that ball 2 has been drawn from urn B.
Case n = N. We define X 2 = 0. To define Y 2 we draw a ball from urn B with double replacement. Define Y 2 as the number of times that ball 2 has been drawn from urn B.
We claim that in both cases (X 1 , The first claim is a Bose-Einstein statistics property: for any n, m ∈ Z + such that n + m ≤ N where γ 0 L−1 := δ 0 . To prove that (Y 1 , Y 2 ) ∼γ N L ⊗γ N L note that for any n, m ∈ {0, . . . , N} so the result follows if we can show that P(Y 2 ∈ ·|X 1 = n) =γ N L , n ∈ {0, . . . , N}. (3.13) To prove (3.13) we will show that Y 2 , conditionally to X 1 = n, is the number of balls 2 extracted in N draws from an L-color Pólya urn (an urn with initially L balls numbered from 1 to L from which balls are drawn with double replacement). Then, by Lemma 2.7 of [9], after N extractions the number of times that ball 2 has been extracted hasγ N L distribution. Let T k (t), k ∈ {1, . . . , L}, be the number of balls k drawn from an L-color Pólya urn in t steps. Then T (t) := (T 1 (t), . . . , T L (t)) is a homogeneous time Markov chain, see for example [9] §2.4, called Pólya urn process. If n = N (3.13) holds because by construction B is a Pólya urn. Assume that n < N. To show that B is a Pólya urn we must verify that balls are uniformly drawn from B at each draw.
To compute the probability to extract a ball k from B we observe that if k ∈ {2, . . . , L}, k is extracted from B if and only if it is extracted from A and the test is a success, while ball 1 is extracted from B if and only if the test fails. So at the first extraction, ball k, k ∈ {2, . . . , L} is chosen from urn B with probability Ball 1 is chosen with probability 1/L. Assume that t extractions have been made with 0 < t < N − n. Recall that t k and f k , k ∈ {2, . . . , L}, denote the number of balls k drawn from urn A and the number of times that the attempt to extract the same ball k from urn B failed respectively in the t extractions. Then urn A has t k + 1 balls k while urn B has 1 + f 2 + · · · + f L balls 1 and 1 + t k − f k balls k, t 2 + · · · + t L = t.
Thus at the t + 1 extraction, ball k with k ∈ {2, . . . , L} is chosen from urn B with probability In any case the balls are chosen uniformly. We iterate the preceding rule until t = N − n.
At this time the urn B is an L-color Pólya urn process after N − n steps. Next we go on extracting a ball from urn B with double replacement for other n steps. So, after N extraction, B is an L-color Pólya urn process after N steps and Y 2 is the number of times ball 2 has been extracted in N draws, i.e. (3.13) holds. We observe that, as in (3.6), Define the event D as "a ball 2 has been drawn from urn A in the first N − X 1 extractions and the associated test failed" and the event E as "a ball 2 has been drawn from urn B in the last X 1 extractions". Then (3.14) If n = N the first term in (3.14) is zero. If n < N we write Observe that by (3.12) and that if X 1 = n and X 2 = j the event D c occurs if and only if all the j tests associated to the extractions of a ball 2 are success. By (3.11) the probability that the test at extraction t is a success, if all the preceding tests are successful (i.e. f 2 = 0), is Thus (3.17) We consider the second term in (3.14). If X 1 = n, B is an L-color Pólya urn from which N − n balls have been drawn. Because the L-color Pólya process is an homogeneous Markov chain, the number of times that a ball 2 will be draw in the last n extractions has distributionγ n L . Thus By plugging this bound and the bound (3.18) into equation (3.14) we get The above estimate, together with (3.10) concludes the proof.

Equilibrium properties of the nonlinear process
In this section we study the long time behavior of the ψ-nonlinear process, corresponding to the GRBB process. We will introduce some technical hypothesis on the nonlinear process which are satisfied in all the examples of Section 3. We need some additional notation. Given µ ∈ P(Z + ) we denote with m µ the mean of µ, with σ 2 µ its variance and withμ its characteristic function. The condition below is the analog of the conservation of particles, stated in equation (2.2), for the nonlinear process.  Particles conservation of the GRBB process gives conservation of the mean of the corresponding nonlinear process as explained in the following lemma. Proof. The proof is obtained by induction. Assume first that E(η(t)) = r < +∞. By equation (2.3): Then E (η(t + 1)) = E [E (η(t + 1)|η(t))] = r.
When r = +∞, again for equation (2.3), we have that η(t + 1) is obtained by adding a finite mean random variable to an infinite mean one.
To study the long time behavior of the ψ-nonlinear process we introduce the following discrete time queue process. Definition 4.4 Let µ ∈ P(Z + ). The G µ /D/1 queue (ζ(t)) t≥0 is the Markov chain with values in Z + defined as follows. Assume that, for some t ≥ 0, ζ(t) = ζ ∈ Z + then B is a random variable with distribution µ. It is independent from everything and independently generated at each time step t.
The long time behavior of the G µ /D/1 queue and its invariant probability measure are described in the theorem below.
Theorem 4.5 The G µ /D/1 queue, with m µ < 1, is an aperiodic irreducible positive persistent Markov chain. Its invariant probability measure π µ has characteristic function
Below we state a lemma which assures a uniform bound on exponential moments of the G µ /D/1 queue and will be used in the proof of Theorem 4.9.
The case λ = 0 is trivial.
The next technical condition is a thinning condition of the family {π µ q } q∈P(Z + ) . It holds for any of the application of Section 3.
Plugging this expression in the right hand side of equation (4.5) and solving it we get E e ixη =q (0)(e ix − 1)μq(x) e ix −μq(x) .