Large deviations, a phase transition, and logarithmic Sobolev inequalities in the block spin Potts model*

We introduce and analyze a generalization of the blocks spin Ising (Curie-Weiss) models that were discussed in a number of recent articles. In these block spin models each spin in one of s blocks can take one of a finite number of q ≥ 3 values or colors, hence the name block spin Potts model. We prove a large deviation principle for the percentage of spins of a certain color in a certain block. These values are represented in an s× q matrix. We show that for uniform block sizes there is a phase transition. In some regime the only equilibrium is the uniform distribution of all colors in all blocks, while in other parameter regimes there is one predominant color, and this is the same color with the same frequency for all blocks. Finally, we establish log-Sobolev-type inequalities for the block spin Potts model.


Introduction
Mean-field models as the Curie-Weiss model are approximations of lattice models. They often show qualitatively interesting results (see [13] for a survey). Mean-field block models have been proposed as an approximation of lattice models for meta-magnets, see e. g. [24]. To describe them, assume that we have N interacting particles that carry a spin. Also assume that we can group these particles into several groups. The interaction is such that particles within the same group interact with one interaction strength, while particles in different groups have another, usually smaller, interaction strength. In a sequence of papers the statistical mechanics of such models was studied from various points of view, see [17,15,9,30,25,27,26]. In particular they were discussed as models for social interactions between several groups, e. g. in [16,2,34,32] (the latter paper studies a combination of Ising models on Erdös-Rényi graphs as in [7,21,22] and block models). Recently, block models have also been studied in a statistical context (see [3], [31]). Here the task is to exactly recover the block structure from a given number of realizations of the model. It turns out that this can be done surprisingly effectively.
However, all the literature cited above deals with Ising spins, i. e. the spins take two values (usually ±1). Of course, the physics literature knows many more spin models than just the Ising model, in particular models with a continuous spin as Heisenberg models and XY-models. On the discrete side Potts models (cf. e. g. [38,23,14,10]) are the most natural generalization of Ising models. For them each particle carries a spin from a finite set (of cardinality 3 or larger). The aim of the present note is to investigate block spin Potts models as a natural generalization of block spin Ising models. We will basically concentrate on models where the blocks have approximately identical size and where the interaction is purely ferromagnetic, i. e. particles tend to have the same spin, no matter, whether they are in the same block or in different ones. Similar to [30] and [27] our main tool are large deviation techniques. Indeed, as we will see in Section 3, it is not too difficult to establish a large deviation principle for the "block magnetizations". However, to derive a limit theorem with an explicit limit law from there turns out to be more complicated than in the case of Ising spins (which is quite a common feature in Potts models).
The rest of this note is organized in the following way. In the next section we will describe the block spin Potts model. Section 3 contains a large deviation analysis of this model. In Section 4, we will concentrate on a version with blocks of asymptotically equal size and compute the possible limit laws for such models. Finally, in Section 5 we prove and briefly discuss (modified) logarithmic Sobolev inequalities for the block spin Potts model.
Let us mention at this point that, while we were finishing the current manuscript we learned that in [29] the author studies a very similar model: Here the number of blocks is restricted to two, but they may be of different size. His techniques, however are different from ours. Moreover, extending his results, we prove a large deviation principle, are able to locate the minima of the rate functions and show logarithmic Sobolev inequalities.

The model
In the sequel we will consider the following model. Take the set S = {1, . . . , N } and partition S into s sets S 1 , . . . , S s . These sets will, of course, depend on N and we assume that the limits γ k := lim N →∞ |S k | N ∈ (0, 1) exist (and, of course, s k=1 γ k = 1). Moreover, take an integer q ≥ 3 and for ω ∈ {1, . . . , q} S and 0 < α < β introduce the Hamiltonian i ∼j 1 ωi=ωj . Here i ∼ j means that the indices i and j belong to the same block S k (where the case i = j is included) for some k ∈ {1, . . . , s}, while we write i ∼ j, if this is not the case. With H N,α,β we will associate the Gibbs measure where, of course, Z N,α,β := Z N := ω exp(−H N (ω )). For k ∈ {1, . . . , s} and c ∈ {1, . . . , q} denote by m k,c the relative number of spins of "color" c in the block S k , i. e.
Large deviations in the block spin Potts model where A α,β := A ∈ M (s × s) is the symmetric matrix with entries β on and α off the diagonal (the block interaction matrix) and B ∈ M (s × q) has entries b k,c = i∈S k the Frobenius scalar product. Hence the Hamiltonian is a positive definite quadratic form of the matrix B and we will write tr(B t AB) = B, B A . Now introducing the diagonal matrix Γ N ∈ M (s × s) given by (Γ N ) k,k = |S k | we finally rewrite (2.1) as It is therefore natural to study the distribution of M N under the Gibbs measure µ N .

A Large Deviation Principle for M N
In this section we prove a Large Deviation Principle (LDP) for the matrix M N . The analysis of the corresponding rate function will help us to determine the limiting behavior of M N and to prove the existence of a phase transition. Let us briefly recall the definition of a large deviation principle (cf. [12] and [11] for a rich survey): For a Polish space X and an increasing sequence of non-negative real numbers (a n ) n∈N a sequence of probability measures (ν n ) n on X is said to satisfy an LDP with speed a n and rate function I : X → R (a lower semi-continuous function with compact level sets {x : Here int(B) and cl(B) denote the topological interior and closure of a set B, respectively. A sequence of random variables X n : Ω → X satisfies an LDP with speed a n and rate function I : X → R under a sequence of measures µ n if the push-forward sequence ν n := µ n • X n does.
for any set B = s k=1 B k with Borel sets B k ⊆ R q (here we associate probabilities ν on the set {1, . . . , q} with vectors in R q and define H(ν|ρ) = H(ν) = ∞, if ν ∈ R q does not have non-negative components summing to 1). Together with the above mentioned LDP for the components M N (k) and the assumption that |S k |/N converges to γ k as N → ∞, this observation implies that the matrix M N under ρ N obeys an LDP with speed N and rate function Here ν := (ν k ) 1≤k≤s ∈ M (s × q) and the ν k are probabilities on {1, . . . , q}, otherwise I(ν) is defined to be ∞. Thus we have seen Here Γ is the s × s diagonal matrix with (Γ) kk = γ k , ν := (ν k ) k and µ := (µ k ) k are s × q-matrices, and the ν k and µ k are probabilities on {1, . . . , q}, otherwise J(ν) is defined to be ∞.
: µ kc ≥ 0 and q c=1 µ k,c = γ k for all k}. Note that every matrix µ ∈ C(γ) can actually be considered as a probability distribution on [sq] and the term − k∈[s] c∈[q] µ k,c log(µ k,c ) is its entropy. However, the set C(γ) places restrictions on the mass that can be placed on every block.

Equilibria for uniform block sizes
An LDP as in Theorem 3.2 or Theorem 3.3 is, in principle, able to determine the limit distributions of the matrix valued random variables M N and M N under the sequence of Gibbs measure µ α,β,N . Indeed, they are given by the minima of the corresponding rate functions. Proof. This is folklore in large deviation theory and not difficult to prove, when realizing that the upper bound in an LDP implies that any measurable subset of R whose closure does not contain a minimum of the rate function has a probability that converges to 0.
We will, in the sequel, determine the minima of J which will also give us the minima of J. We start with the observation that the minimum points of J are the maximum points of where µ = (µ kc ) ∈ C(γ) and we have set 0 log 0 := 0.
Proof. Suppose one of µ's entries equals 0, without loss of generality µ 11 = 0. Then, there is 2 ≤ i ≤ q such that µ 1i ≥ γ 1 /(q − 1). Note that G is the sum of a polynomial of degree two in the µ kc 's and − s k=1 q c=1 µ kc log µ kc . Now −t log t has derivative infinity at 0. Hence, for ε > 0 small enough, we have G(µ) < G(µ ) where µ is the matrix that we obtain from µ, if we replace µ 11 by µ 11 = ε, µ 1i by µ 1i = µ 1i − ε and leave the other entries unaltered.
Let us now apply the method of Lagrange multipliers to find the maximum points of G. Let λ = (λ 1 , λ 2 , . . . , λ s ). We then need to find the critical points of L(µ, λ) = Summing these equations over all c yields q(1 and plugging this into (4.2) we finally arrive at our system of critical equations that any maximum point has to solve. Let us rephrase the value of G in critical points by multiplying (4.3) by µ kc and summing over c: Hence, if µ crit is a critical point of G, we can write G(µ crit ) as Next we will see that of all critical points only those where all rows of µ have the same, e. g. an increasing order, are relevant.
for all s-tuples (σ k ) k of permutations σ k ∈ S q . In particular, for α > 0, the function G can only be maximal in a point µ , if the rows of µ are ordered in the same way, i. e. if there is a σ ∈ S q such that µ kσ(1) ≤ . . . ≤ µ kσ(q) for all 1 ≤ k ≤ s.
Proof. Recall the rearrangement inequality [20,Th. 368]: When x 1 ≤ . . . ≤ x n and y 1 ≤ . . . ≤ y n are sequences of real numbers, then for every permutation π ∈ S n one has n i=1 x i y π(i) ≤ x 1 y 1 + x 2 y 2 + . . . x n y n and the inequality is strict if there are indices j < j with x j < x j and y π(j) > y π(j ) . Applying this to row k and k of µ gives q c=1 µ kc µ k c ≥ q c=1 µ kσ k (c) µ k σ k (c) for every two permutations σ k , σ k ∈ S q . Summing over all k = k yields (4.5). In particular, µ is not a maximum point, if we have two rows k = k and indices j < j with µ kj < µ kj and µ k j > µ k j .
So we can and will assume in the following, that all rows of a critical point µ of G are increasing. The next lemma determines the structure of a critical µ. 1. If µ kc = µ kc for a k and c = c then µ k c = µ k c for all 1 ≤ k ≤ s. 2. Each row of µ has at most two different entries.
Proof. Substracting (4.3) for c from the equation for c yields α k =k µ k c = α k =k µ k c and thus, by increasing order, µ k c = µ k c for all 1 ≤ k ≤ s. This is the first claim.
For two different columns c = c we obtain from (4.3) Now if we had three columns c < c < c with µ kc < µ kc < µ kc for one row k (and hence for all due to the first part of this lemma) we would have can constrain our search for maximum points of G to matrices µ with positive entries, increasingly ordered rows and having at most two different columns. Taking into account that the entries in row k sum up to γ k we see that the largest column µ + (component by component) of µ together with the number 1 ≤ r ≤ q of columns equal to µ + is all the information we need to build up µ. So either µ has q identical columns γ/q or (the increasingly ordered) µ reads At this stage we do not know how to proceed with the case of an arbitrary γ. However, a proof for the case of asymptotically equal block sizes γ k = 1/s for all k is readily accomplished.
and regard the right hand side of this equation as a function of µ + k , say ψ(µ + k ). Now if the largest entry of the vector µ + occurs in line K (and perhaps somewhere else) but not in every line, then ψ(µ + K ) − α sµ + K − 1 q < 0. Since ψ(t) diverges to +∞ when t approaches 1/sr from below we can find a t 0 > µ + K with ψ(t 0 ) = α(st 0 − 1/q). Now building a matrix ν by taking instead of r columns equal to µ + just r columns with identical entries t 0 and completing the matrix with q − r columns with identical entries (1 − srt 0 )/(s(q − r)) then clearly ν is (well defined and) a critical point. So we just have to prove that G(µ) < G(ν).
Wrapping up what we have seen, we state on the set V : Proof. The maximum of G bP on C(γ) is attained on the subset of matrices with identical rows taken from {v/s | v ∈ V } and the value is equal to β+(s−1)α However, up to a minus sign and ignoring the summand log s this is the free energy functional in a Potts model at inverse temperature (β + (s − 1)α)/s.
The following theorem hence follows from the results in [23,14], where the critical temperature and the behaviour of the Potts model is computed. To describe the "low temperature" behavior define the function ϕ : [0, 1] → R q : and let u(g) be the largest solution of the equation u = 1−e −gu 1+(q−1)e −gu . Let n 1 (g) := ϕ(u(g)) and n i (g) be n 1 (g) with the i th and the first coordinate interchanged, i = 2, . . . , q. Let ν i (g) be the matrix with all rows identical to n i (g) and Q be the matrix with all entries identical to 1/(qs).
Then, if g > ζ q the distribution of M N under the Gibbs measure µ N,α,β concentrates in a (uniform) mixture of the Dirac measures in ν 1 (g), . . . ν q (g).
At g = ζ q the limit points of M N under the Gibbs measure µ N,α,β are Q and ν 1 (g), . . . ν q (g).

Logarithmic Sobolev inequalities
In this section, we present logarithmic Sobolev inequalities(LSIs) for block spin Potts models. LSIs are frequently used e. g. in concentration of measure theory, where they form the core of the well-known entropy method (cf. the monographs [28,6]). Recently, LSIs for various type of finite spin systems have been established ( [18,35]), a line of research we continue by considering block spin Potts models. For ω = (ω i ) i∈S ∈ {1, . . . , q} S and i ∈ S, let ω i c := (ω j ) j =i . Moreover, for any function f : {1, . . . , q} S → R, we define a certain "difference operator" by where µ N (· | ω i c ) denotes the regular conditional probability. The integrals may be regarded as a kind of "local variance" in the i-th coordinate. The difference operator d is a well-known object, and |df | 2 dµ N can be regarded as a Dirichlet form (cf. [18,Rem. 2.2]). Finally, for any non-negative function f , Ent µ N (f ) := f log(f )dµ N − f dµ N log( f dµ N ) denotes the entropy.
Here, σ 1 , σ 2 , σ 3 > 0 are constants which depend on β and q only. Finally, note that (5.2) is frequently used in the context of Markov processes, and it is equivalent to exponential decay of the relative entropy along the Glauber semigroup (cf. e. g. [4,8]). It moreover implies that the associated Glauber dynamics is rapidly mixing, i. e. its mixing time is O(N log N ), see [35,Th. 2.2]. This complements [5], where a different situation was considered (the usual Potts model without blocks but on graphs with fixed maximal degree).
For the proof of Theorem 5.1, recall that for product measures µ = ⊗ n i=1 µ i , the entropy functional tensorizes, i. e. Ent µ (f ) ≤ N i=1 Ent µi (f )dµ, and therefore, proving LSIs reduces to controlling each coordinate separately, i. e. a "one-dimensional" case. For non-product measures, if the dependencies are sufficiently weak, an analogue can be shown which is called an approximate tensorization property. A criterion for approximate tensorization in probability spaces with finitely many atoms was introduced in [33], which we will exploit in the sequel. Proposition 5.2. Assume that 2qβe β < 1. For N large enough, the approximate tensorization property of entropy holds, i. e.
with C depending on β and q only.
Proof. We shall apply Marton's approximate tensorization result [33] in the slightly rewritten and corrected form stated in [35,Th. 4.1]. Essentially, we need to control the conditional probabilities µ N (·|ω i c ), which we rewrite in the sequel, generalizing the case of the usual Potts model without blocks as in [37,Prop. 2.16]. Recalling the matrix B = (b k,c ) from (2.2), we fix two sites i, j ∈ S and define b k,ij,c := ν∈S k ν / ∈{i,j} 1 ων =c . Then,