When does the chaos in the Curie-Weiss model stop to propagate?

We investigate increasing propagation of chaos for the mean-field Ising model of ferromagnetism (also known as the Curie-Weiss model) with $N$ spins at inverse temperature $\beta>0$ and subject to an external magnetic field of strength $h\in\mathbb{R}$. Using a different proof technique than in [Ben Arous, Zeitouni; 1999] we confirm the well-known propagation of chaos phenomenon: If $k=k(N)=o(N)$ as $N\to\infty$, then the $k$'th marginal distribution of the Gibbs measure converges to a product measure at $\beta<1$ or $h \neq 0$ and to a mixture of two product measures, if $\beta>1$ and $h =0$. More importantly, we also show that if $k(N)/N\to \alpha\in (0,1]$, this property is lost and we identify a non-zero limit of the total variation distance between the number of positive spins among any $k$-tuple and the corresponding binomial distribution.


INTRODUCTION
The Curie-Weiss model is a mean-field model for ferromagnetism in statistical mechanics.It is described by a sequence of probability measures, the Gibbs measures µ N , on the sets {−1, +1} N .These measures are parametrized by a positive parameter β > 0 known as the inverse temperature and a real parameter h ∈ R interpreted as strength of an external magnetic field.Given such β > 0 and h ∈ R, the Gibbs measure takes the following form: The normalizing constant is called the partition function of the model.
There is a vast literature on the Curie-Weiss model with the main asymptotic results summarized in the textbooks [5], [11] and [14]; see also the papers [6,10,12].The order parameter of the model is called the magnetization and is defined by where P N := P N (σ ) := |{i = 1, . . ., N : σ i = +1}| is the number of positive spins.Let µ N • m −1 N denote the distribution of the random variable m N under the probability measure µ N .The first-order limiting behavior of the magnetization is given by which indicates a phase transition at β = 1 in the absence of the external magnetic field (h = 0).Here, =⇒ denotes weak convergence, δ x is the Dirac measure at x and m(β , h) is the largest in absolute value solution to This solution is unique and positive if h > 0, unique and negative if h < 0, and it is equal to zero if h = 0 and 0 < β ≤ 1.If h = 0 and β > 1, equation ( 4) has two non-zero solutions m(β , 0) and −m(β , 0).
For β > 1, still assuming h = 0, there is a conditional central limit theorem for )), conditioned on m N > 0 (respectively, m N < 0).In this case the limiting expectation is 0 and the limiting variance is v 2 β ,0 .Finally, if h = 0, there are no phase transitions as β varies in (0, ∞) and √ N(m N − m(β , h)) converges to the centred normal distribution with variance v 2 β ,h .
1.1.Propagation of chaos and the main results.For the time being, fix k ∈ N and pick any k-tuple among N spins.The propagation of chaos paradigm for Gibbs measure states that for a mean-field model as the Curie-Weiss model the marginal distributions of the k spins become asymptotically independent.This shall be investigated in the sequel.Since the family of random variables (σ i ) N i=1 is exchangeable under the Gibbs measure µ N , without loss of generality we may pick the first k spins and consider their marginal distribution µ (k) N,β ,h .Let P k := |{ j ∈ {1, . . . k} : σ j = +1}| be the number of positive spins among the picked ones.Note that P k completely determines µ (k) N,β ,h , hence we might as well study the distribution of P k under µ N , denoted by µ N • P −1 k .Intuitively, if h = 0 and 0 < β < 1, as N → ∞, the first k spins should indeed be asymptotically independent and take values ±1 with the same probability 1/2.This implies that the distribution of P k should be close to the binomial distribution with parameters k and 1/2.This can formally be written as follows.Let Bin (n, p) denote a binomial distribution with parameters n ∈ N 0 and p ∈ [0, 1]: Throughout the paper we shall slightly abuse the notation and write mixtures of distributions by simply writing a mixing distribution instead of a parameter which is being mixed.For example, for any distribution Recall that the total variation distance d TV between two probability measures M 1 and M 2 on R is defined by where B(R) is the Borel sigma-algebra.Recall also that k ∈ N is fixed for the time being.The 'propagation of chaos' phenomenon for the Curie-Weiss model tells us that, if h = 0 and 0 < β < 1, then lim On the other hand, for h = 0 and β > 1, it is known that Note that the approximating distribution is a mixture of two binomial distributions, one with success probability (1 + m(β , 0))/2 and the other one with (1 − m(β , 0))/2.Finally, it is known that for h = 0 (and arbitrary β > 0), relation (8) holds with 1/2 replaced by Assume now that k = k(N) depends on N and lim N→∞ k(N) = ∞.The aim of the this note is to answer the question whether the analogues of ( 8) and ( 9) hold true in this case and if not, what is a threshold on the growth of k(N) such that the limit in (8) or (9) becomes non-zero and what is the value of the limit.Note that for k(N)/N → 0, that is k = o(N), propagation of chaos has been explicitly shown to hold for β = 1 and h = 0 in [4] (note that the authors consider relative entropy rather than total variation distance and that their model also covers h = 0 implicitly).Our answer is provided by Theorem 1.1 below, which is our main result.To simplify its formulation, let us introduce additional notation.
For m ∈ R and v 2 > 0 put That is to say, t → ϕ(t; m, v 2 ) is the density of a Gaussian distribution with mean m and variance v 2 .Define 2 ) dt (11) to be the total variation distance between two centred Gaussian distributions, which can easily be calculated in terms of the error function.Then our main result reads as follows.
Let us now informally discuss the case when α > 0. For simplicity, we consider (13).The limit on the right-hand side is non-zero, which suggests that there is a residual dependence between the k(N) spins under the Gibbs measure.The reason for the non-zero limit is the fact that the distribution of P k(N) and the corresponding binomial distribution satisfy central limit theorems with different variances, the variance of P k(N) being strictly larger, which comes from the fact that the spins are positively correlated under the Gibbs measure.The distance between these normal distributions appears on the right-hand side of (13).
In Theorem 3.5, we shall determine a mixed binomial distribution which approximates the distribution of In some sense, this describes the residual dependence between the spins under the Gibbs measure.
Remark 1.2.The exchangeability of the measure µ N has been used to investigate the Curie-Weiss model for example, in [17, Section 5.2] and [2].In particular, an explicit representation of µ N as a mixture of Bernoulli measures (valid for each fixed N) can be found in [17,Theorem 5.6].A general propagation of chaos principle stating that the distribution of k entries in a finite exchangeable vector of length n can be approximated by a mixture of i.i.d.distributions can found in [7].
The paper is organized as follows.Our proof relies on local limit theorems for the magnetization m N and also for the total number of positive spins P N under µ N .In some regimes those are known.We collect the corresponding results in Section 2 below.The proofs of these local limit theorems, which we have not been able to locate in the literature, are given in Section 4. The proof of Theorem 1.1 is given in Section 3, including the statement of residual dependence.Two auxiliary technical results related to calculations of the total variation distance are presented in Section 5.

LOCAL LIMIT THEOREM FOR THE MAGNETIZATION
Denote by N (m, v 2 ) a Gaussian distribution with mean m and variance v 2 , so This correction term appears below in the local limit theorems for m N , since Nm N always has the same parity as N.
and the following local limit theorem holds true: and the following local limit theorem holds true: For some values of (β , h) the above local limit theorems can be extracted from the vast literature on the Curie-Weiss model.For example, in the high-temperature regime β ∈ (0, 1) and for every h ∈ R, (16) has been proved in [19,Theorem 4.5 and Eq. (4.4)].If h > 0 and β > 0, then ( 16) can be found in [1, Theorem 2.14 and Lemma 1.1].The missing case h < 0 and β > 0 in Proposition 2.1 can be derived by the same methods.Finally, if h = 0, a local limit theorem for a multi-group Curie-Weiss model in the hightemperature regime β ∈ (0, 1) has been derived in [13].Quite (un-)expectedly, we have not been able to locate Proposition 2.2 in the literature, because of the non-standard approximation by a mixture of normal distributions.We shall give an elementary proof based on the Stirling approximation in Section 4.
We shall actually need local limit theorems for P N rather than m N .They follow immediately from Propositions 2.1 and 2.2 using the obvious relation R, for some absolute constant C > 0, which is a consequence of the mean value theorem for differentiable functions.The above bound allows us to neglect the correction term δ N appearing in local limit theorems for m N .
and the following local limit theorem holds true Corollary 2.4.Assume that h = 0 and β > 1.Then and the following local limit theorem holds true

PROOF OF THEOREM 1.1
We embark on a simple observation which is a consequence of exchangeability of the spins under the Gibbs measure µ N .Given that P N = i ∈ N 0 , the conditional distribution of P k is hypergeometric with parameters N, i and k denoted hereafter HyperG(N, i, k).Recall that In other words, the distribution of P k(N) can be represented as the following mixture of hypergeometric distributions: The family of hypergeometric distributions possesses the following property which is of major importance for us.Proof.For 0 ≤ j ≤ k, Alternatively, we can argue probabilistically: If each of n balls is colored black or white with probability p and 1 − p, respectivelly, and then a sample of k balls is drawn at random from n balls, then the number of black balls in the sample has binomial distribution with parameters (k, p).
The subsequent proof of Theorem 1.1 proceeds according to the following scheme: STEP 1.We approximate L (P N ) by an appropriate mixed binomial distribution Bin (N, L (W )), where W is a random variable taking values in [0, 1] and L (X) denotes the distribution of a random variable X.
The approximation is understood in the sense of the d TV -distance which, as we shall show, converges to 0.
To accomplish this step we employ the local limit theorems for P N provided by Corollaries 2.3 and 2.4.
It turns out that in a role of the mixing distribution W we can take a beta distribution (or a mixture of two beta-distributions) with properly adjusted parameters.
where the infimum is taken over all pairs (X,Y ) of random variables such that X is distributed according to M 1 and Y is distributed according to M 2 .STEP 3. We derive a local limit theorem for Bin (k, L (W )).
STEP 4. We calculate the total variation distance between Bin (k, L (W )) and the three binomial distributions appearing in Theorem 1.1 by using local limit theorems for binomial distributions in conjunction with another well-known formula for d TV : If measures M 1 and M 2 are supported on Z, then see Propositions 5.1 and Proposition 5.2 below.
Our implementation of Steps 1-3 relies on the next proposition.For α, β > 0, Beta (α, β ) denotes a beta distribution with the density where B is the Euler beta-function.In what follows we assume that all auxiliary random variables are defined on some probability space (Ω, F , P).
Proposition 3.2.Assume that (Θ N ) N∈N is a sequence of N 0 -valued random variables that satisfy a local limit theorem of the form: for a fixed integer K ∈ N, a collection of positive weights p 1 , . . ., p K satisfying Suppose that k = k(N) is a sequence of positive integers such that (12) holds for some α ∈ [0, 1] and put For N ∈ N, let X N be a random variable with a mixed hypergeometric distribution HyperG(N, L (Θ N ), k(N)) and Y N,k(N) be a random variable with the mixed binomial distribution Bin (k(N), ∑ K j=1 p j Beta (γ j,1 N, γ j,2 N)), that is, Remark 3.3.We shall use this proposition with Θ N = P N and K = 1, p 1 = 1 (in conjunction with Corollary 2.3) or K = 2, p 1 = p 2 = 1/2 (in conjunction with Corollary 2.4).
Remark 3.4.Let us provide an informal explanation of Proposition 3.2.Consider for simplicity the case K = 1, p = 1.Then, (21) states that Θ N satisfies a local limit theorem with asymptotic centering Na 1 .One is therefore tempted to approximate Θ N by the binomial distribution Bin (N, a 1 ), however the variance of this distribution, which equals Na 1 (1 − a 1 ), is strictly smaller than the asymptotic variance Nσ 2 1 appearing in (21) due to the assumption σ 2 1 > a 1 (1 − a 1 ).Instead, we approximate Θ N by a mixed binomial distribution that artificially blows up the variance until the variances match.The informal explanation of this is that both distributions satisfy a local limit theorem with the same centering and normalization.In the case k(N) = N, we have Θ N d = X N and Proposition 3.2 states that the total variation distance between the distribution of X N and Bin (N, Beta (γ 1,1 N, γ 1,2 N)) converges to 0. Moreover, in the formal limit σ 2 1 ↓ a 1 (1 − a 1 ), we retrieve Lemma 3.1.
Proof of Proposition 3.2.We start by noting that, for fixed j = 1, . . ., K, the pair (γ j,1 , γ j,2 ) is a unique solu- tion to the following system of equations On the left-hand side of the first equation we recognize the mean of Y N,N , whereas the left-hand side of the second equation is equal to the variance of Y N,N up to a term O( 1).This suggests that the distribution of Θ N is close to the distribution of Y N,N .In fact, assume that we have proved lim According to (19) there exists a sequence of pairs To this end, we shall check that Y N,N satisfies exactly the same local limit theorem as Θ N , that is, We shall actually prove a stronger result for later considerations, namely where By the very definition of Y N,k(N) it suffices to prove (26) only in the case K = 1, p 1 = 1 and j = 1.We find it instructive to first prove a central limit theorem for Y N,k(N) of the form which explains the formula for the variance σ 2 α,1 in (26).Let B N be a random variable with the betadistribution Beta (γ 1,1 N, γ 1,2 N) and C (1) N be independent random variables with gamma-distributions with parameters (γ 1,1 N, 1) and (γ 1,2 N, 1), respectively.Using a representation for the beta-distribution, see [15, Theorem 3], and the central limit theorem for C (1)

N and C
(2) where (γ 1,1 +γ 1,2 ) 3 .Moreover, let (U k ) k∈N be a sequence of independent copies of a random variable with the uniform distribution on [0, 1].It is well-known that in the Skorokhod J 1 -topology on D[0, 1], where (W (t)) t∈[0,1] is a standard Brownian bridge.By independence and our assumption k(N)/N → α, we have a joint convergence which is a.s.continuous at the limiting point in (31) we arrive at which is equivalent to (28), since Var (W (a 1 )) = a 1 (1 − a 1 ).Fix ε ∈ (0, min(a 1 , 1 − a 1 )) and note that where the penultimate equality follows from the law of total variance, and the last equality is a consequence of lim N→∞ Var( √ NB N ) = s 2 ; see (29).The above estimate in conjunction with a standard tail estimate for the normal law implies that ( 26) is equivalent to The latter can be deduced from the following explicit formula together with the asymptotic relation for the beta-function, which is uniform in x, y ∈ [δ , δ −1 ], for every fixed δ ∈ (0, 1).Out choice of ε ensures that all the arguments of the beta-functions in (33) lie in [δ k(N), δ −1 k(N)] for a sufficiently small δ > 0. The uniform asymptotic relation ( 35) is a consequence of Stirling's formula with a uniform estimate of the remainder; see, for example, Eq. (5.11.10) and (5.11.11) in [18].The proof of Proposition 3.2 is complete.
Remark 3.6.The theorem above describes the residual dependence in the propagation of chaos phenomenon via a beta-binomial distribution, which in turn has a simple interpretation as follows.Consider a Pólya urn which initially contains γ 1 (β , h)N positive spins (white balls) and γ 2 (β , h)N negative spins (black balls).Balls are drawn one at a time and immediately returned to the urn together with a new ball of the same color.The number of white balls drawn after k(N) trials has the beta-binomial distribution Bin (k(N), Beta (γ 1 (β , h)N, γ 2 (β , h)N)).Thus, Theorem 3.5 tells us that the number of positive spins under the Gibbs measure is close in distribution to the number of white balls drawn from the above Pólya urn after k(N) trials.
CASE h = 0 AND β > 0. In this case we again apply Theorem 3.5.From (26) it follows that Combining this with the de Moivre-Laplace local limit theorem and using Proposition 5.1 we arrive at (15).] that the total variation distance between the distribution of (ξ 1;n , . . ., ξ k;N ) and the standard normal distribution on R k converges to 0. On the other hand, for k(N)/N → α with α = 0, the total variation distance between these distributions converges to a non-zero limit which has been identified in [9, Theorem 1.6 (b)]; see also Eq. (2.12) on p. 403 in [8].All these results are similar to what we know about the Curie-Weiss model.There is, however, one important difference: An approximation by a variance mixture of normal distributions is not possible if α > 0 in the setting of ξ N .A much more general result has been shown in [9, Theorem 2.3 (b)].Let us give a short informal argument.Let (ζ 1 , . . ., ζ k ) be a random vector with the standard normal distribution on R k and let R > 0 be a mixing variable independent of the ζ i 's.We ask whether it is possible to approximate (ξ 1;N , . . . ,ξ k;N ) by R • (ζ 1 , . . . ,ζ k ) in the total variation distance.By rotation invariance, it suffices to consider the distance between the squared radial parts; see Eq. (2.4) on p. 402 in [8].Since 1 On the other hand, the squared radial part of R k has a chi-square distribution with k degrees of freedom, and we have where we used that E(χ 4 k ) = 2k + k 2 .If we want to match expectations, we need and we cannot match the variances since 2α > 2α(1 − α).This is in sharp contrast to the case of the Curie-Weiss model, for which Theorem 3.5 shows that the larger variance of P k(N) can be artificially matched by a mixed binomial distribution.

PROOF OF PROPOSITION 2.2
Recall that we work under the assumptions h = 0 and β > 1, which imply m := m(β , 0) ∈ (0, 1).For simplicity we also assume throughout this proof that N = 2n is even.The case of odd N can be treated similarly.For every −n ≤ ℓ ≤ n, we have For further use we record the asymptotic formula for the partition function: if β > 1, then where ], which can be derived from Théorème B in [3] by specializing to the Curie-Weiss model or from (3.19) in [16] by choosing p = 1; see also [ is even and attains two local maxima at ℓ ≈ nm and ℓ ≈ −nm.This can be checked by calculating the ratio of its two consecutive values.Using the aforementioned symmetry and the standard tail estimate for the normal density, we see that (17)  To check that the indices outside C n are negligible we use the same reasoning as in [16]; see pp.

STEP 2 .
Lemma 3.1 implies that the µ N -distribution of P k is close (in a sense of the d TV -distance) to the mixed binomial distribution Bin (k, L (W )).Formal verification of this employs the well-known characterization of the d TV -distance has the same distribution as Θ N (respectively, Y N,N ), for every N ∈ N. Therefore, d TV (L (X N ), HyperG(N, L (Y N,N ), k(N))) =d TV (HyperG(N, L (Θ N ), k(N)), HyperG (N, L (Y N,N ), k(N))) =d TV (HyperG(N, L (Θ ′ N ), k(N)), HyperG (N, L (Y ′ N,N ), k(N))) → 0. But this immediately yields the claim since HyperG(N, L (Y N,N ), k(N)) has the same distribution as Y N,k(N) by Lemma 3.1.Thus, it remains to prove (24).

Remark 3 . 8 .
Here we give a comparison of our findings with some known results on the uniform distribution on a high-dimensional sphere, which bears some similarities with the Curie-Weiss model in the context of the propagation of chaos phenomenon.For every N ∈ N, consider a random vector ξ N = (ξ 1;N , . . . ,ξ N;N ) which is uniformly distributed on the sphere√ NS N−1 of radius √ N in R N .Itis a classical result of Maxwell-Poincaré-Borel that, for every k ∈ N, the distribution of any k components of ξ N converges weakly to a k-dimensional standard normal distribution, as N → ∞.Moreover, if k = k(N) depends on N such that k(N)/N → 0, then it has been shown in [8, Section 2