A note on Talagrand's positivity principle

Talagrand's positivity principle states that one can slightly perturb a Hamiltonian in the Sherrington-Kirkpatrick model in such a way that the overlap of two configurations under the perturbed Gibbs' measure will become typically nonnegative. In this note we observe that abstracting from the setting of the SK model only improves the result and does not require any modifications in Talagrand's argument. In this version, for example, positivity principle immediately applies to the setting of Aizenman-Sims-Starr interpolation. Also, abstracting from the SK model improves the conditions in the Ghirlanda-Guerra identities and as a consequence results in a perturbation of smaller order necessary to ensure positivity of the overlap.


Introduction.
Let us consider a unit sphere S = {z ∈ R N : |z| = 1} on euclidean space R N and let ν be a probability measure on S. Given a measurable function g : S → R let us define a probability measure ν g on S by a change of density dν g dν = e g(z) e g(z) dν(z) .
We assume that the denominator on the right hand side is integrable.Let us make a specific choice of g(z) given by g(z) = v p≥1 2 −p x p g p (z) for g p (z) = 1≤i 1 ,...,ip≤N where v ≥ 0, x p are i.i.d.random variables uniform on [0, 1] and g i 1 ,...,ip are i.i.d.standard Gaussian for all i 1 , . . ., i p and all p ≥ 1.
Given a function f on S n let us denote by f its average with respect to measure ν ⊗n g .Let us denote by E g the expectation with respect to Gaussian random variables and by E x the expectation with respect to uniform random variables in the definition of g(z) in (1.1).The following is the main result of this note.
Theorem 1 (Talagrand's positivity principle) For any ε > 0 there exists large enough v ≥ 1 in (1.1) such that 2) The choice of v does not depend on N and ν.
This means that one can define a random perturbation ν g of an arbitrary measure ν such that the scalar product z 1 • z 2 of two vectors drawn independently from distribution ν g will be typically nonnegative.This result was proved in Section 6.6 [8] in the setting of the Sherrington-Kirkpatrick model where ν was a random Gibbs' measure and in which case the expectation on the left hand side of (1.2) was also in the randomness of ν.The main ingredient of the proof was the Ghirlanda-Guerra identities that are typically given in the setting of the SK model as well.
The main contribution of this note is an observation that abstracting ourselves from the setting of the SK model results in some qualitative improvements of the positivity principle of Theorem 1.First of all, we notice that Talagrand's proof in [8] requires no modifications in order to prove a positivity principle that holds uniformly over ν rather than on average over a random Gibbs' measure in the SK model.This observation, for example, implies the result in [9] without any additional work as will be shown in Example below.Another important qualitative improvement is the fact that the choice of v in (1.2) is independent of N. In [8], one needed v ≫ N 1/4 -a condition that appears in the proof of Ghirlanda-Guerra identities due to the fact that one controls random Gibbs' measure from the very beginning.We will show below that the supremum of g(z) on the sphere is of order v √ N which means that one can perturb any measure on S by a change of density of order exp v √ N and force the scalar product z 1 • z 2 to be essentially nonnegative.
Example (Positivity in Guerra's replica symmetry breaking bound, [9]).The main result in [9] states that Guerra's replica symmetry breaking bound [3] applies to odd p-spin interactions as well.The proof utilizes the Aizenman-Sims-Starr version of Guerra's interpolation [1] and a positivity principle that requires a concentration inequality for the free energy along the interpolation.We observe that this positivity principle follows directly from Theorem 1.Let A be a countable set and let (w α ) be a probability function on A such that w α ≥ 0, α∈A w α = 1.Let H(z, α) be a function on Ω = Σ × A for some finite subset Σ of S. Let us consider a probability measure on Ω given by µ{(z, α)} ∼ w α exp H(z, α) + g(z) where g(z) is defined in (1.1).Then its marginal on Σ is equal to ν g if we define ν by ν{z} ∼ w α exp H(z, α).Therefore, By Theorem 1, for large enough v > 0, and this inequality holds uniformly over all choices of H and w.Therefore, we can average over arbitrary random distribution of H and w.In particular, in [9], (w α ) was a random Derrida-Ruelle process on A = N k and H was Guerra's interpolating Hamiltonian in the form of Aizenman-Sims-Starr.
General remarks.(i) As we will see below, Talagrand's proof of positivity principle uses very deep information about a joint distribution of z 1 • z 2 for 2 ≤ l ≤ n under measure E g ν ⊗n g for a typical realization of (x p ).However, it is not clear whether this deep information is really necessary to prove positivity and if some simpler argument would not suffice.For example, if we consider only a first order Gaussian term in (1.1) and define g ′ (z) = v i≤N g i z i then measure ν g ′ under a change of density proportional to e g ′ (z) would favor a random direction g = (g 1 , . . ., g N ) and it is conceivable that for large enough v independent of ν and N two independent vectors z 1 and z 2 from this measure would typically "point in the same direction", i.e.
In fact, even a weaker result with v = o( √ N) would be sufficient for applications as in [9].(ii) Theorem 1 implies the following non-random version of positivity principle.
Corollary 1 For any ε > 0 there exists v > 0 large enough such that the following holds.For any distribution Q on the set of measures on the sphere S there exists a (non-random) function g(z) such that for some absolute constant L, It would be of interest to prove this result directly and not as a corollary of Theorem 1.Then, by Hahn-Banach theorem, one can find a distribution P on the set of functions {sup z∈S |g(z)| ≤ Lv √ N} such that for all probability measures ν on S, This would give another proof of Theorem 1 with non constructive description of P.
Sketch of the proof of positivity principle.The main ingredient in Talagrand's proof is the extended Ghirlanda-Guerra identities (Section 6.4 in [8]) that state that if we sample (z 1 , . . ., z n+1 ) from a measure E g ν ⊗(n+1) g (which is a mixture of product measures of ν g over the randomness of Gaussian r.v.s) then for a typical realization of (x p ) the scalar product z 1 • z n+1 with probability 1/n is independent of z 1 • z l , 2 ≤ l ≤ n, and with probabilities 1/n it is equal to one of them.More precisely, the following holds.
Theorem 2 (Ghirlanda-Guerra identities) For any measurable function f on S n such that |f | ≤ 1 and for any continuous function ψ on [−1, 1] we have where δ(ψ, n, v) → 0 as v → ∞ and δ does not depend on N and ν.
The main idea of Talagrand's proof can be shortly described as follows.Suppose that we sample z 1 , . . ., z n independently from any measure on S. Then the event that all z 1 • z l ≤ −ε simultaneously is very unlikely and its probability is or order 1/(nε).The bound on this probability is uniform over all measures and therefore can be averaged over some distribution on measures and holds, for example, for E g ν ⊗n g .On the other hand, by Ghirlanda-Guerra identities under measure E g ν ⊗n g the events {z 1 • z l ≤ −ε} are strongly correlated due to the fact that with some prescribed probabilities z 1 • z l can be equal.As a consequence, the simultaneous occurrence of these events will have probability of order a/n 1−a where a is the probability of one of them.But since this quantity is bounded by 1/(nε), taking n large enough shows that a should be small enough.

Proofs.
The proof of Theorem 2 follows exactly the same argument as the proof of Ghirlanda-Guerra identities in [2] or in [8].Since we consider a fixed measure ν, we do not need to control a fluctuations of a random Hamiltonian and, as a result, we get a better condition on v.The main part of the proof is the following lemma.
Lemma 1 For any p ≥ 1 there exists a constant L p that depends on p only such that Proof.Let us fix (x p ) p≥1 in the definition of g(z) and until the end of the proof let E denote the expectation in Gaussian random variables only.Define θ = log e g(z) dν(z), ψ = Eθ.
Given p ≥ 1, let us think of θ and ψ as functions of x = x p only and define v p = v2 −p .Then θ ′ (x) = v p g p (z) and we have and by Cauchy inequality To prove (2.1) is remains to approximate g p (z) by E g p (z) and to achieve that we will use a simple consequence of convexity of θ and ψ given in the inequality (2.5) below.Since we can apply a well-known Gaussian concentration inequality, for completeness given in Lemma 3 in Appendix A, to get Below we will choose |y| ≤ 1 so that |x ± y| ≤ 2. (Remark.At the same step in the setting of the SK model one also needs to control a random Hamiltonian which produces another term of order √ N and this results in unnecessary condition on v in the positivity principle.)Inequality (2.5) then implies that 2) and |y| ≤ 1 we get if we take y = v −1/2 ≤ 1.Using explicit expressions for θ ′ and ψ ′ we finally get Together with (2.3) this gives We now recall that E was expectation E g with respect to Gaussian random variables only.Also, integral over x = x p ∈ [0, 1] is nothing but expectation with respect to x p .Therefore, averaging over all remaining x p finishes the proof.
The following inequality was used in the previous proof and it quantifies the fact that if two convex functions are close to each other then their derivatives are also close.
Lemma 2 If θ(x) and ψ(x) are convex differentiable functions then Proof.By convexity, for any y > 0 If we define then the above inequalities imply Similarly, Combining two inequalities finishes the proof.
Proof of Theorem 2. Since |f | ≤ 1 we can write By Gaussian integration by parts, the left hand side is equal to nv2 −p x p φ p where Lemma 1 then implies nv2 −p E x x p φ p ≤ L p √ v and, thus, for some constant L that depends only on n and p.Since φ p ≤ 2, for any x 0 ∈ (0, 1), and minimizing over x 0 we get E x φ p (x p ) ≤ Lv −1/4 .Since any continuous function ψ on [−1, 1] can be approximated by polynomials this, obviously, implies the result.
Proof of Theorem 1.
Step 1.First we use Ghirlanda-Guerra identities to give a lower bound on probability that all z 1 • z l ≤ −ε for 2 ≤ l ≤ n.In order to use Theorem 2 it will be convenient to denote by δ any quantity that depends on (x p ) and such that E x |δ| does not depend on ν and N and E x |δ| → 0 as v → ∞.Then (1.3) can be written as Even though a function ψ is assumed to be continuous, let us use this result formally for ψ(x) = I(x ≤ −ε).The argument can be easily modified by using continuous approximations of the indicator function ψ.Let Then by Ghirlanda-Guerra identities By induction, where the last inequality follows from a simple estimate for l ≥ 2 Step 2. On the other hand, we will show that f n is of order 1/(nε) and to emphasize the fact that this is true for any measure we now simply write G instead of ν g .If z 1 , . . ., z n are i.i.d.from distribution G then where, given 0 < γ < 1, we defined a set We would like to show that if γ is close to 1 then G(U) is small.This follows from the fact that the average of z 1 • z 2 is nonnegative with respect to any product measure, and, therefore, G(U) ≤ 2(1 − γ)/ε.Then, (2.7) implies that and we can minimize this bound over 0 < γ < 1 to get Step 3. Together with (2.6) this implies a Ln By construction ϕ(0) = 1 and the above inequality implies that ϕ(1) ≤ exp as 2 .On the other hand, by construction, ϕ(1) = E exp s(X − X ′ ), where X ′ is an independent copy of X.Thus, E exp s(X − X ′ ) ≤ exp as 2 .
By Jensen's inequality, E exp s(X − EX) ≤ exp as 2 and by Markov's inequality Obviously, a similar inequality can be written for EX − X and, therefore, The result follows.