A Central Limit Theorem for Weighted Averages of Spins in the High Temperature Region of the Sherrington-Kirkpatrick Model

In this paper we prove that in the high temperature region of the Sherrington-Kirkpatrick model for a typical realization of the disorder the weighted average of spins P i • N t i (cid:190) i will be approximately Gaussian provided that max i • N j t i j = P i • N t 2 i is small.


Introduction.
Consider a space of configurations Σ N = {−1, +1} N . A configuration σ ∈ Σ N is a vector (σ 1 , . . . , σ N ) of spins σ i each of which can take the values ±1. Consider an array (g ij ) i,j≤N of i.i.d. standard normal random variables that is called the disorder. Given parameters β > 0 and h ≥ 0, let us define a Hamiltonian on Σ N and define the Gibbs' measure G on Σ N by The normalizing factor Z N is called the partition function. Gibbs' measure G is a random measure on Σ N since it depends on the disorder (g ij ). The parameter β physically represents the inverse of the temperature and in this paper we will consider only the (very) high temperature region of the Sherrington-Kirkpatrick model which corresponds to β < β 0 (1.1) for some small absolute constant β 0 > 0. The actual value β 0 is not specified here but, in principal, it can be determined through careful analysis of all arguments of this paper and references to other papers. For any n ≥ 1 and a function f on the product space (Σ n N , G ⊗n ), f will denote its expectation with respect to G ⊗n f = Σ n N f (σ 1 , . . . , σ n )G ⊗n ({(σ 1 , . . . , σ n )}).
In this paper we will prove the following result concerning the high temperature region (1.1). Given a vector (t 1 , . . . , t N ) such that let us consider a random variable on (Σ N , G) defined as The main goal of this paper is to show that in the high temperature region (1.1) the following holds. If max i≤N |t i | is small then for a typical realization of the disorder (g ij ) the random variable X is approximately Gaussian r.v. with mean X and variance X 2 − X 2 . By the "typical realization" we understand that the statement holds on the set of measure close to one. This result is the analogue of a very classical result for independent random variables. Namely, given a sequence of independent random variables ξ 1 , . . . , ξ N satisfying some integrability conditions the random variable ξ 1 +. . .+ξ N will be approximately Gaussian if max i≤N Var(ξ i )/ i≤N Var(ξ i ) is small (see, for example, [5] ). In particular, if σ 1 , . . . , σ N in (1.3) were i.i.d. Bernoulli random variables then X would be approximately Gaussian provided that max i≤N |t i | is small. It is important to note at this point that the main claim of this paper in some sense is a well expected result since it is well known that in the high temperature region the spins become "decoupled" in the limit N → ∞. For example, Theorem 2.4.10 in [12] states that for a fixed n ≥ 1, for a typical realization of the disorder (g ij ) the distribution G ⊗n becomes a product measure when N → ∞. Thus, in the very essence the claim that X in (1.3) is approximately Gaussian is a central limit theorem for weakly dependent random variables. However, the entire sequence (σ 1 , . . . , σ N ) is a much more complicated object than a fixed finite subset (σ 1 , . . . , σ n ), and some unexpected complications arise that we will try to describe after we state our main result -Theorem 1 below.
Instead of dealing with the random variable X we will look at its symmetrized version Y = X − X , where X is an independent copy of X. If we can show that Y is approximately Gaussian then, obviously, X will also be approximately Gaussian. The main reason to consider a symmetrized version of X is very simple -it makes it much easier to keep track of numerous indices in all the arguments below, even though it would be possible to carry out similar arguments for a centered version X − X .
In order to show that for a typical realization (g ij ) and a small max i≤N |t i |, Y is approximately Gaussian with mean 0 and variance Y 2 we will proceed by showing that its moments behave like moments of a Gaussian random variable, i.e.
where a(l) = Eg l , for a standard normal random variable g. Since the moments of the standard normal random variable are also characterized by the recursive formulas a(0) = 0, a(1) = 1 and a(l) = (l − 1)a(l − 2), Let us define two sequences (σ 1(l) ) l≥0 and (σ 2(l) ) l≥0 of jointly independent random variables with Gibbs' distribution G. We will assume that all indices 1(l) and 2(l) are different and one can think of σ 1(l) and σ 2(l) as different coordinates of the infinite product space (Σ ∞ N , G ⊗∞ ). Let us define a sequence S l by In other words, S l are independent copies of Y.
The following Theorem is the main result of the paper.
Remark. Theorem 1 answers the question raised in the Research problem 2.4.11 in [12]. Theorem 1 easily implies that Indeed, For the first and second terms on the right hand side we can use independent copies to represent the powers of · l and then apply Theorem 1. For example, Similarly, Clearly, combining these equations proves (1.7). Now one can show that for N → ∞ and max i≤N |t i | → 0 the characteristic function of (S 1 , . . . , S n ) can be approximated by the characteristic function of n independent Gaussian random variables with variance S 2 1 , for (g ij ) on the set of measure converging to 1. Given (1.7) this should be a mere exercise and we omit the details. This, of course, implies that (S 1 , . . . , S n ) are approximately independent Gaussian random variables with respect to the measure G ⊗∞ and, in particular, S 1 = i≤N t iσi is approximately Gaussian with respect to the measure G ⊗2 .
Theorem 1 looks very similar to the central limit theorem for the overlap where σ 1 , σ 2 are two independent copies of σ (see, for example, Theorem 2.3.9 and Section 2.7 in [12]). In fact, in our proofs we follow the main ideas and techniques of Sections 2.4 -2.7 in [12]. However, the proof of the central limit theorem for X in (1.3) turned out to be by at least an order of magnitude more technically involved than the proof of the central limit theorem for the overlap R 1,2 (at least we do not know any easier proof). One of the main reasons why the situation here gets more complicated is the absence of symmetry. Let us try to explain this informally. When dealing with the overlaps R i,j one considers the quantity of the following type E i<j R k i,j i,j (1.8) and approximates it by other similar quantities using the cavity method. In the cavity method one approximates the Gibbs' measure G by the measure with the last coordinate σ N independent of the other coordinates and this is achieved by a proper interpolation between these measures. As a result, the average (1.8) is approximated by the Taylor expansion along this interpolation and at the second order of approximation one gets the terms that have "smaller complexity" and a term that is the factor of (1.8); one then can solve for (1.8) and proceed by induction on the "complexity". The main reason this trick works is the symmetry of the expression (1.8) with respect to all coordinates of the configurations. Unfortunately, this does not happen any longer in the setting of Theorem 1 due to the lack of symmetry of X in the coordinates of σ. Instead, we will have to consider both terms on the left hand side of (1.6), and approximate both of them using the cavity method. The technical difficulty of the proof comes from the fact that at the second order of approximation it is not immediately obvious which terms corresponding to the two expressions in (1.9) cancel each other up to the correct error terms and this requires some work. Moreover, in order to obtain the correct error terms, we will need to make two coordinates σ N and σ N −1 independent of the other coordinates and to simplify the computations and avoid using the cavity method on each coordinate separately we will develop the cavity method for two coordinates. Finally, another difficulty that arises from the lack of symmetry is that unlike in the case of overlaps R i,j we were not able to compute explicitly the expectation X and variance X 2 − X 2 in terms of the parameters of the model.

Preliminary results.
We will first state several results from [12] that will be constantly used throughout the paper. Lemmas 1 through 6 below are either taken directly from [12] or almost identical to some of the results [12] and, therefore, we will state them without the proof. Let us consider where z is a standard normal r.v. independent of the disorder (g ij ) and q is the unique solution of the equation For 0 ≤ t ≤ 1 let us consider the Hamiltonian and define Gibbs' measure G t and expectation · t similarly to G and · above, only using the Hamiltonian −H N,t (σ). For any n ≥ 1 and a function f on Σ n N let us define The case t = 1 corresponds to the Hamiltonian −H N (σ), and the case t = 0 has a very special property that the last coordinate σ N is independent of the other coordinates which is the main idea of the cavity method (see [12]). (Cavity method is a classical and fruitful idea in Physics, but in this paper we refer to a specific version of the cavity method invented by Talagrand.) Given indices l, l , let us define The following Lemma holds.
Lemma 1 For 0 ≤ t < 1, and for all functions f on Σ n N we have This is Proposition 2.4.5 in [12].
Roughly speaking, this two results explain the main idea behind the key methods of [12] -the cavity method and the smart path method. The Hamiltonian (2.2) represents a "smart path" between the measures G and G 0 , since along this path the derivative ν t (f ) is small, because all terms in (2.3) contain a factor R l,l − q which is small due to (2.4). Measure G 0 has a special coordinate (cavity) σ N that is independent of the other coordinates, which in many cases makes it easier to analyze ν 0 (f ).
This two lemmas imply the following Taylor expansion for ν(f ).

Lemma 3
For a function f on Σ n N we have Proof. Proof is identical to Proposition 2.5.3 in [12].
Cavity method with two coordinates. In this paper we will use another case of the cavity method with two coordinates σ N , σ N −1 playing the special role. In this new case we will consider a "smart path" that makes both coordinates σ N and σ N −1 independent of other coordinates and of each other. This is done by slightly modifying the definition of the Hamiltonian (2.2). Since it will always be clear from the context which "smart path" we are using, we will abuse the notations and use the same notations as in the case of the Hamiltonian (2.2).
Let us consider where z 1 , z 2 are standard normal r.v. independent of the disorder (g ij ).
For 0 ≤ t ≤ 1 let us now consider the Hamiltonian and define Gibbs' measure G t and expectation · t similarly to G and · above, only using the Hamiltonian (2.6.) For any n ≥ 1 and a function f on Σ n N let us define We will make one distinction in the notations between the cases (2.2) and (2.6). Namely, for t = 0 in the case of the Hamiltonian (2.6) we will denote It is clear that with respect to the Gibbs' measure G 0 the last two coordinates σ N and σ N −1 are independent of the other coordinates and of each other. Given indices l, l let us define The following lemma is the analogue of Lemma 1 for the case of the Hamiltonian (2.6).
Lemma 4 Consider ν t (·) that corresponds to the Hamiltonian (2.6). Then, for 0 ≤ t < 1, and for all functions f on Σ n N we have Proof. The proof repeats the proof of Proposition 2.4.5 in [12] almost without changes.
Lemma 5 There exists β 0 > 0 and L > 0 such that for β < β 0 and for any k ≥ 1, The second inequality is similar to (2.4) and it follows easily from it since |R 1,2 −R = 1,2 | ≤ 2/N (see, for example, the proof of Lemma 2.5.2 in [12]). The first inequality follows easily from Lemma 4 (see, for example, Proposition 2.4.6 in [12]). Lemma 3 above also holds in the case of the Hamiltonian (2.6).

Lemma 6
For a function f on Σ n N we have The proof is identical to the proof of Proposition 2.5.3 in [12].
To prove Theorem 1 we will need several preliminary results. First, it will be very important to control the size of the random variables S l and we will start by proving exponential integrability of S l .
Theorem 2 There exist β 0 > 0 and L > 0 such that for all β ≤ β 0 , and for all k ≥ 1 14) The statement of Theorem 2 is, obviously, equivalent to for large enough L.
Proof. The proof mimics the proof of Theorem 2.5.1 in [12] (stated in Lemma 2 above). We will prove Theorem 2 by induction over k. Our induction assumption will be the following: there exist β 0 > 0 and L > 0 such that for all Let us start by proving this statement for k = 1. We have Thus we need to prove that We will now show that ν 0 (σ 1σN ) = 0 and ν 0 (σ 1σN ) = O(N −1 ). The fact that ν 0 (σ 1σN ) = 0 is obvious since for measure G ⊗2 0 the last coordinates σ 1 N , σ 2 N are independent of the first N − 1 coordinates and ν 0 ( Since for a fixed disorder (r.v. g ij and z) the last coordinates σ i N , i ≤ 4 are independent of the first N − 1 coordinates and independent of each other, we can write First of all, the first and the last terms are equal to zero because σ 1 N − σ 2 N 0 = 0. Next, by symmetry Therefore, we get . In order to avoid introducing new notations we notice that it is equivalent to proving that ν( where q − is the solution of (2.1) with β substituted with β − , we would get Lemma 2.4.15 in [12] states that for β ≤ β 0 , |q − q − | ≤ LN −1 and, therefore, the above inequality would imply that ν 0 (σ 1 (R − 1,3 − q)) = O(N −1 ). To prove (2.16) we notice that by symmetry ν(σ 1 (R 1,3 − q)) = ν(σ N (R 1,3 − q)), and we apply (2.5) which in this case implies that where in the last inequality we used (2.4). Finally, This finishes the proof of (2.15) for k = 1. It remains to prove the induction step. One can write Let us define ν i (·) in the same way we defined ν 0 (·) only now the i-th coordinate plays the same role as the N -th coordinate played for ν 0 . Using Proposition 2.4.7 in [12] we get that for any τ 1 , τ 2 > 1 such that 1/τ 1 + 1/τ 2 = 1, Let us take τ 1 = (2k + 2)/(2k + 1) and τ 2 = 2k + 2. By (2.4) we can estimate then for this parameters the induction step is not needed since this inequality is precisely what we are trying to prove. Thus, without loss of generality, we can assume that ν Combining this with (2.18), (2.19) and (2.20) we get Plugging this estimate into (2.17) we get One can write, we get .

(2.22)
First of all, by induction hypothesis (2.15) we have since this is exactly (2.15) for parameters N − 1, β − = β 1 − 1/N , and since j =i t 2 j ≤ 1. Next, by Proposition 2.4.6 in [12] we have where in the last inequality we again used (2.15). Thus, (2.21) and (2.22) imply for L large enough. This completes the proof of the induction step and Theorem 2.
Remark. Theorem 2 and Lemmas 2 and 5 will be often used implicitly in the proof of Theorem 1 in the following way. For example, if we consider a sequence S l defined in (1.5) then by Hölder's inequality (first with respect to · and then with respect to E) one can write where in the last equality we applied Theorem 2 and Lemma 2. Similarly, when we consider a function that is a product of the factors of the type R l,l − q or S l , we will simply say that each factor R l,l − q contributes O(N −1/2 ) and each factor S l contributes O(1).
The following result plays the central role in the proof of Theorem 1. We consider a function where S l are defined in (1.5) and where q l are arbitrary natural numbers, and we consider the following quantity ν (R l,l − q)(R m,m − q)φ .
We will show that this quantity essentially does not depend on the choice of pairs (l, l ) and (m, m ) or, more accurately, it depends only on their joint configuration. This type of quantities will appear when one considers the second derivative of ν φ , after two applications of Lemma 1 or Lemma 4, and we will be able to cancel some of these terms up to the smaller order approximation.
Proof. The proof is based on the following observation. Given (l, l ) consider where b = σ = ( σ i ) i≤N . One can express R l,l − q as R l,l − q = T l,l + T l + T l + T. (2.25) The joint behavior of these quantities (2.24) was completely described in Sections 6 and 7 of [12]. Our main observation here is that under the restrictions on indices made in the statement of Lemma 7 the function φ will be "almost" independent of these quantities and all proofs in [12] can be carried out with some minor modifications. Let us consider the case when (l, l ) = (m, m ) and (p, p ) = (r, r ). Using (2.25) we can write (R l,l − q)(R m,m − q) as the sum of terms of the following types: T l,l T m,m , T l,l T m , T l,l T, T l T m , T l T and T T.
Similarly, we can decompose (R p,p − q)(R r,r − q). The terms on the left hand side of (2.23) containing a factor T T will obviously cancel out. Thus, we only need to prove that any other term multiplied by φ will produce a quantity of order O(max |t i |N −1 ). Let us consider, for example, the term ν(T l,l T m,m φ). To prove that ν(T l,l T m,m φ) = O(max |t i |N −1 ) we will follow the proof of Proposition 2.6.5 in [12] with some necessary adjustments. Let us consider indices i(1), i(2), i(3), i(4) that are not equal to any of the indices that appear in T l,l , T m,m or φ. Then we can write, Let us consider one term in this sum, for example, then we can decompose (2.27) as where R 1 is the sum of terms of the following type where φ − j = n l=1 (S − l ) q l /S − j , and R 2 is the sum of terms of the following type using Theorem 2 and Lemma 2. To bound R j 1 we notice that ν 0 (R j 1 ) = 0, and, moreover, ν 0 (R j 1 ) = O(N −1 ), since by (2.3) each term in the derivative will have another factor R − l,l −q. Therefore, using (2.5) we get ν(R j 1 ) = O(N −1 ). The second term in (2.28) will have order O(N −3/2 ) since and one can again apply (2.5). Thus the last two lines in (2.28) will be of order To estimate the first term in (2.28) we apply Proposition 2.6.3 in [12] which in this case implies Now, using the similar decomposition as (2.27), (2.28) one can easily show that Thus, combining all the estimates the term (2.27) becomes All other terms on the right-hand side of (2.26) can be written in exactly the same way, by using the cavity method in the corresponding coordinate and, thus, (2.26) becomes For small enough β, e.g. Lβ 2 ≤ 1/2 this implies that ν(T l,l T m,m φ) = O(max |t i |N −1 ). To prove (2.23) in the case when (l, l ) = (m, m ) and (p, p ) = (r, r ), it remains to estimate all other terms produces by decomposition (2.25) and this is done by following the proofs of corresponding results in the Section 2.6 of [12]. The case when (l, l ) = (m, m ) and (p, p ) = (r, r ) is slightly different. The decomposition of (R l,l − q) 2 using (2.25) will produce new terms ν(T 2 l,l φ) and ν(T 2 l φ), which are not small but up to the terms of order O(max |t i |N −1 ) will be equal to the corresponding terms produces by the decomposition of (R p,p − q) 2 . To see this, once again, one should follow the proofs of the corresponding results in the Section 2.6 of [12] with minor changes.

Proof of Theorem 1
Theorem 1 is obvious if at least one k l is odd since in this case the left hand side of (1.6) will be equal to 0. We will assume that all k l are even and, moreover, at least one of them is greater than 2, say k 1 ≥ 4. Since a(l) = (l − 1)a(l − 2), in order to prove (1.6) it is, obviously, enough to prove (3.1) We will try to analyze and compare the terms on the left hand side. Let us write and ¿From now on we will carefully analyze terms in (3.2) in several steps and at each step we will notice that one of two things happens: (a) The term produced at the same step of our analysis carried out for (3.3) is exactly the same up to a constant k 1 − 1; (b) The term is "small" meaning that after combining all the steps one would get something of order O(max |t i |).
Let us look at one term in (3.2) and (3.3), for example, If we define S − l by the equation First of all, ν 0 (III) = ν 0 (VI) = 0 and, therefore, applying (2.5) Next, again using (2.5) Thus the contribution of the terms II and V in (3.1) will cancel out -the first appearance of case (a) mentioned above. The terms of order O(t 2 N +t N N −1/2 ) when plugged back into (3.2) and (3.3) will produce Here we, of course, assume that similar analysis is carried out for the i-th term in (3.2) and (3.3) with the only difference that the ith coordinate plays the special role in the definition of ν 0 . We now proceed to analyze the terms I and IV. If we define S = l by the equation then, and where R 23 is the sum of terms of the following typē for some (not important here) powers q l , and where R 3 is the sum of terms of the following typeσ Similarly, and whereR 23 is the sum of terms of the following typē for some (not important here) powers q l , and whereR 3 is the sum of terms of the following typeσ . Indeed, one need to note that ν 00 (R 23 ) = 0, and using Lemma 4, ν 00 (R 23 ) = O(N −1 ) since each term produced by (2.9) will have a factor σ l N −1 00 = 0, each term produced by (2.10) will have a factor σ 1 N 00 = 0, and each term produced by (2.11) Obviously, ν 00 (R 1l ) = 0. To show that ν 00 (R 1l ) = 0, let us first note that the terms produced by (2.9) will contain a factor σ l N −1 00 = 0, the terms produced by (2.10) will contain a factor σ 1 N 00 = 0, and the terms produced by (2.11) will contain a factor (S = 1 ) k 1 −1 00 = 0, since k 1 − 1 is odd and S = 1 is symmetric. For the second derivative we will have different types of terms produced by a combination of (2.9), (2.10) and (2.11). The terms produced by using (2.11) twice will have order O(N −2 ); the terms produced by using (2.11) and either (2.10) or (2.9) will have order O(N −3/2 ), since the factor R = l,l − q will produce N −1/2 ; the terms produced by (2.9) and (2.9), or by (2.10) and (2.10) will be equal to 0 since they will contain factors σ l N −1 00 = 0 and σ 1 N 00 = 0 correspondingly. Finally, let us consider the terms produced by (2.9) and (2.10), e.g.
It will obviously be equal to 0 unless m, p ∈ {1(1), 2(1)} and m , p ∈ {1(l), 2(l)} since, otherwise, there will be a factor σ 1 N 00 = 0 or σ l N 00 = 0. All non zero terms will cancel due to the following observation. Consider, for example, the term which corresponds to m = 1(1), m = 1(l), p = 2(1) and p = 2(l). There will also be a similar term that corresponds to m = 2(1), m = 1(l), p = 1(1) and p = 2(l) (indices m and p are changed) These two terms will cancel since the product of the first two factors is unchanged and, making the change of variables 1(1) → 2(1), 2(1) → 1(1) in the last factor we get (note that Using ( We will prove only (3.6) since (3.7) is proved similarly. Since ν 00 (R 21 ) = ν 00 (R 21 ) = 0 it is enough to prove that On both sides the terms produced by (2.10) will be equal to 0, the terms produced by (2.11) will be of order O(N −1 ), thus, it suffices to compare the terms produced by (2.9). For the left hand side the terms produced by (2.9) will be of the type (S = l ) k l and will be equal to 0 unless m ∈ {1(1), 2(1)} and m ∈ {1(1), 2(1)}. For a fixed m consider the sum of two terms that correspond to m = 1(1) and m = 2(1), i.e.
For m ∈ {1(2), 2(2), . . . , 1(n), 2(n)} this term will have a factor β 2 , and for m = 2n + 1 it will have a factor −β 2 (2n). Similarly, the derivative on the right hand side of (3.8) will consist of the terms of type For m ∈ {1(1), 2(1), . . . , 1(n), 2(n)} this term will have a factor β 2 , and for m = 2n + 3 it will have a factor −β 2 (2n + 2). We will show next that for any m and m , This implies, for example, that all terms in the derivatives are "almost" independent of the index m . This will also imply (3.8) since, given arbitrary fixed m , the left hand side of (3.8) will be equal to and the right hand side of (3.8) will be equal to which is the same up to the terms of order O(N −1 ). For simplicity of notations, instead of proving (3.9) we will prove Let us write the left hand side as and consider one term in this sum, for example, ν(U N ). Using (2.5), one can write since each term in the derivative already contains a factor R − l,l − q. Thus, where ν i is defined the same way as ν 0 only now ith coordinated plays the same role as N th coordinate plays for ν 0 (= ν N ). Therefore, again using (2.13) and (2.12) and writing Similarly one can write, If we can finally show that this will prove (3.10) and (3.8). For example, if we consider ν 0 (U N ), since all other terms are equal to 0. Similarly, one can easily see that This finishes the proof of (3.8).
The comparison of R 22 andR 22 can be carried out exactly the same way.