Optimizing Mean Field Spin Glasses with External Field

We consider the Hamiltonians of mean-field spin glasses, which are certain random functions $H_N$ defined on high-dimensional cubes or spheres in $\mathbb R^N$. The asymptotic maximum values of these functions were famously obtained by Talagrand and later by Panchenko and by Chen. The landscape of approximate maxima of $H_N$ is described by various forms of replica symmetry breaking exhibiting a broad range of possible behaviors. We study the problem of efficiently computing an approximate maximizer of $H_N$. We give a two-phase message pasing algorithm to approximately maximize $H_N$ when a no overlap-gap condition holds. This generalizes several recent works by allowing a non-trivial external field. For even Ising spin glasses with constant external field, our algorithm succeeds exactly when existing methods fail to rule out approximate maximization for a wide class of algorithms. Moreover we give a branching variant of our algorithm which constructs a full ultrametric tree of approximate maxima.


Introduction
Optimizing non-convex functions in high dimensions is well-known to be computationally intractible in general.In this paper we study the optimization of a natural class of random non-convex functions, namely the Hamiltonians of mean-field spin glasses.These functions H N are defined on either the cube Σ N = {−1, 1} N or the sphere S N −1 ( √ N ) of radius √ N and have been studied since [SK75] as models for the behavior of disordered magnetic systems.
The distribution of an N -dimensional mean-field spin glass Hamiltonian H N is described by an exponentially decaying sequence (c p ) p≥2 of non-negative real numbers as well as an external field probability distribution L h on R with finite second moment.Given these data, one samples h 1 , . . ., h N ∼ L h and standard Gaussians g i1,...,ip ∼ N(0, 1) and then defines H N : R N → R by g i1,...,ip x i1 . . .x ip .
The distribution of the non-linear part H N is characterized by the mixture function ξ(z) = p≥2 c 2 p z p -there are no issues of convergence for |z| ≤ 1+η thanks to the exponential decay assumption.We assume throughout that ξ is not the zero function so that we study a genuine spin glass.H N is then a centered Gaussian process with covariance Spin glasses were introduced to model the magnetic properties of diluted materials and have been studied in statistical physics and probability since the seminal work [SK75].In this context, the object of study is the Gibbs measure e βH N (x) dµ(x) Z N,β where β > 0 is the inverse-temperature, µ(x) is a fixed reference measure and Z N,β is a random normalizing constant known as the partition function.The most common choice is to take µ(•) the uniform measure on Σ N = {−1, 1} N , and another canonical choice is the uniform measure on S N −1 ( √ N ).These two choices define Ising and spherical spin glasses.The quantity of primary interest is the free energy βHN (x) ].The in-probability normalized limit F (β) = p-lim N →∞ FN (β) N of the free energy at temperature β is famously given by an infinite-dimensional variational problem known as the Parisi formula (or the Cristanti-Sommers formula in the spherical case) as we review in the next section.These free energies are well-concentrated and taking a second limit lim β→∞ F (β) β yields the asymptotic ground state energies From the point of view of optimization, spin glass Hamiltonians serve as natural examples of highly non-convex functions.Indeed, the landscape of H N can exhibit quite complicated behavior.For instance H N may have exponentially many near-maxima on Σ N [Cha09, DEZ + 15, CHL18].The structure of these near-maxima is highly nontrivial; the Gibbs measures on Σ N are approximate ultrametrics in a certain sense, at least in the so-called generic models [Jag17,CS21].Moreover spherical spin glasses typically have exponentially many local maxima and saddle points, which are natural barriers to gradient descent and similar optimization algorithms [ABA13,Auf13,Sub17,AMMN19].The utility of a rich model of random functions is made clear by a comparison to the theory of high-dimensional non-convex optimization in the worst-case setting.In the black-box model of optimization based on querying function values, gradients, and Hessians, approximately optimizing an unknown non-convex function in high-dimension efficiently is trivially impossible and substantial effort has gone towards the more modest task of finding a local optimum or stationary point [CDHS17, JGN + 17, AAZB + 17, CDHS18, CDHS19].Even for quadratic polynomials in N variables, it is quasi-NP hard to reach within a factor log(N ) ε of the optimum [ABE + 05].For polynomials of degree p ≥ 3 on the sphere, [BBH + 12] proves that even an approximation ratio e (log N ) ε is computationally infeasible to obtain.
Despite the worst-case obstructions just outlined, a series of recent works have found great success in approximately maximizing certain spin glass Hamiltonians.By approximate maximization we always mean maximization up to a factor (1 + ε), where ε > 0 is an arbitrarily small positive constant; we similarly refer to a point x ∈ Σ N or x ∈ S N −1 ( √ N ) achieving such a nearly optimal value as an approximate maximizer (where the small constant ε is implicit).Subag showed in [Sub21] how to approximately maximize spherical spin glasses by using top eigenvectors of the Hessian ∇ 2 H N .Subsequently [Mon19,AMS21] developed a message passing algorithm with similar guarantees for the Ising case.These works all operate under an assumption of no overlap gap, a condition which is expected (known in the spherical setting) to hold for some but not all models (ξ, L h ) -otherwise they achieve an explicit, sub-optimal energy value.Such a no overlap gap assumption is expected to be necessary to find approximate maxima efficiently.Indeed, the works [AJ18, GJ21, GJW20] rule out various algorithms for optimizing spin glasses when an overlap gap holds.Variants of the overlap gap property have been shown to rule out (1 + ε)approximation by certain classes of algorithms for random optimization problems on sparse graphs [MMZ05, ACORT11, GS14, RV17, GS17, CGPR19, Wei22].Overlap gaps have also been proposed as evidence of computational hardness for a range of statistical tasks including planted clique, planted dense submatrix, sparse regression, and sparse principal component analysis [GZ17, GL18, GJS21, GZ19, AWZ20].We discuss overlap gaps more in Subsection 1.2 and Section 6.
The aforementioned algorithms in [Sub21,Mon19,AMS21] are all restricted to settings with no external field, i.e. with h i = 0 for all i.Our main purpose is to generalize these results to allow for an external field.We focus primarily on the Ising case and explain in Section 5 how to handle the easier spherical models.Our main algorithm consists of two stages of message passing.The first stage is inspired by the work [Bol14] which constructs solutions to the TAP equations for the SK model at high temperature.We construct approximate solutions to the generalized TAP equations of [Sub18,CPS21a,CPS21b], which heuristically amounts to locating the root of the ultrametric tree of approximate maxima.The second stage is similar to [Mon19,AMS21] and uses incremental approximate message passing to descend the ultrametric tree by simulating the SDE corresponding to a candidate solution for the Parisi variational problem.A related two-stage message passing algorithm was recently introduced in our joint work with A.E. Alaoui on the negative spherical perceptron [AS20].
While the primary goal in this line of work is to construct a single approximate maximizer, Subag beautifully observed in [Sub21,Remark 6] that an extension of his Hessianbased construction for spherical models produces approximate maximizers arranged into a completely arbitary ultrametric space obeying an obvious diameter upper bound.The overlap gap property essentially states that distances between approximate maximizers cannot take certain values, and so this is a sort of constructive converse result.In Section 4 we give a branching version of our main algorithm, following a suggestion of [AM20], which constructs an arbitrary ultrametric space of approximate maximizers in the Ising case (again subject to a diameter upper bound).

Optimizing Ising Spin Glasses
To state our results we require the Parisi formula for the ground state of a mean field Ising spin glass as given in [AC17].Let U be the function space The functions γ are meant to correspond to cumulative distribution functions -for finite β the corresponding Parisi formula requires γ(1) = 1, but this constraint disappears in renormalizing to obtain a zero-temperature limit.For γ ∈ U we take Φ γ (t, x) : [0, 1]×R → R to be the solution of the following Parisi PDE: This PDE is known to be well-posed, see Proposition 2.6.Intimately related to the above PDE is the stochastic differential equation which we call the Parisi SDE.The Parisi functional P : U → R with external field distribution L h is given by: The Parisi formula for the ground state energy is as follows.

GS(ξ, L
Moreover the minimum is attained at a unique γ U * ∈ U .Through the paper, γ U * will always refer to the minimizer of Theorem 1.We now turn to algorithms.In [Mon19], Montanari introduced the class of incremental approximate message passing (IAMP) algorithms to optimize the SK model.These are a special form of the wellstudied approximate message passing (AMP) algorithms, reviewed in Subsection 2.1.The work [AMS21] showed that the maximum asympototic value of H N achievable by IAMP algorithms is given by the minimizer of P, assuming it exists, over a larger class of nonmonotone functions, when L h = δ 0 so there is no external field.This larger class is: Here T V [0, t] denotes the total variation norm The Parisi PDE (1.2) and associated SDE extend also to L .We denote by γ L * ∈ L the minimizer of P over L , assuming that it exists.Note that uniqueness always holds by Lemma 1.4 below.We remark that [AMS21] does not include the right-continuity constraint in defining L , however this constraint only serves to eliminate ambiguities between γ 1 , γ 2 differing on sets of measure 0. In fact [AMS21] assumes γ ∈ L is right-continuous from Lemma 6.9 onward.
We clarify our use of the word "efficient" in Subsection 2.1 -in short, it means that O ε (1) evaluations of ∇ H N and first/second partial derivatives of Φ γ L We say γ * ∈ L is optimizable if it is q-optimizable for q = inf(supp(γ * )).We say that (ξ, L h ) is optimizable, or equivalently that the no overlap gap property holds for (ξ, L h ), if the function γ U * is optimizable.In [Mon19], the widely believed conjecture that (in our language) the Sherrington-Kirkpatrick model with ξ(x) = x 2 /2 is optimizable was assumed.Our preliminary numerical simulations suggest that the SK model remains optimizable with any constant external field L h = δ h .However even without external field, proving this conjecture rigorously seems to be very difficult.
For q ∈ [0, 1), let L q = {γ ∈ L : inf(supp(γ)) ≥ q} consist of functions in L vanishing on [0, q).The next lemma shows optimizability is equivalent to minimizing P over either L or L q .It is related to results in [AC15, JT16,AMS21] which show that γ U * and γ L * satisfy (1.5), in the former case when t is a point of increase for γ U * .The proof is given in Section 7.

P(γ * ) = inf γ∈Lq P(γ).
Moreover if a minimizer exists in either variational problem just above, then it is unique.
Lemma 1.4 implies that any optimizable γ * is in fact the unique minimizer γ L * ∈ L of the Parisi functional.However throughout much of the paper we will use γ * to denote general optimizable function without making use of this result.We made this choice because while Lemma 1.4 is important to make sense of our results, it is not necessary for proving e.g.Theorem 3 below.We now state our main results.
Theorem 3. Suppose γ * ∈ L is optimizable.Then for any ε > 0 there exists an efficient AMP algorithm which outputs σ ∈ Σ N such that with probability tending to 1 as N → ∞.
Lemma 1.5.If γ U * strictly increases on [q, 1) for q = inf(supp(γ U * )), then no overlap gap holds, i.e. γ U * is optimizable.Corollary 1.6.Suppose no overlap gap holds.Then for any ε > 0 an efficient AMP algorithm outputs σ ∈ Σ N satisfying Remark 1.7.Unlike for U the infimum inf γ∈L P(γ) need not be achieved, i.e. an optimizable γ * need not exist.For instance, one has ξ ′′ (0) = 0 whenever c 2 = 0. On the other hand if γ is optimizable, Corollary 7.1 and Lemma 7.6 (with q = 0) yield In light of Lemma 7.2 the integrand on the left-hand side is O(ξ ′′ (s)) = o(1) so the above cannot hold for small t.Hence if c 2 = 0 there exists no optimizable γ * .We conjecture that conversely a minimizing γ L * ∈ L exists whenever c 2 > 0, but we do not have a proof.Remark 1.8.By the symmetry of H N , the external field can also be a deterministic vector h = (h 1 , . . ., h N ).As long as the empirical distribution of the values and the external field is independent of H N , exactly the same results holdsee [AMS21, Appendix A, Theorem 6] for the corresponding AMP statement in this slightly more general setting.

Optimizability and No Overlap Gap
In contrast to Corollary 1.6, the paper [GJ21] rules out approximate maximization using AMP for pure p-spin models without external field based on an overlap gap property whenever p ≥ 4 is even.In formulating this result, [GJ21] defines an AMP algorithm to be any iteration of the form given in Subsection 2.1 with Lipschitz non-linearities f 0 , f 1 , . . ., f ℓ which outputs σ = max(−1, min(1, f ℓ (z 0 , . . ., z ℓ ))), the final iterate f ℓ projected into [−1, 1] N .Here ℓ is a large constant which cannot grow with N .(In [AMS21] and the present work, the final iterate is rounded to be in Σ N but this step does not change the asymptotic value of H N and is essentially irrelevant -see for instance Equation (3.14).)In fact for a broad class of models, their main result based on the overlap gap property applies exactly when γ U * is not optimizable.This justifies our definition of (ξ, L h ) as having "no overlap gap" if and only if it is optimizable.Proposition 1.9.Suppose γ U * is not optimizable, where ξ is an even polynomial and the external field L h = δ h is constant.Then there is ε > 0 such that for any AMP algorithm with random output σ, The proof of Proposition 1.9 is identical to that of [GJ21, Theorem 3.3] and we give an outline in Section 6.Taken together, Corollary 1.6 and Proposition 1.9 exactly characterize the mean-field Ising spin glasses for which approximate maximization is possible by AMP, at least when ξ is even and the external field is constant.We remark that similar lower bounds were studied for the class of constant-degree polynomial algorithms in [GJW20].These results also extend to any non-optimizable Ising spin glass with even ξ and constant h, ruling out approximate maximization algorithms in a slightly weaker sense.Constantdegree polynomials encompass AMP in most cases by approximating each non-linearity f ℓ by a polynomial in a suitable sense, see e.g.[Mon19,Theorem 6].
We conclude this subsection with a brief discussion on our terminology.Our definition of optimizability is closely related to "full" or "continuous" replica symmetry breaking.For example, the definitions of full RSB used in [Mon19,Sub21] essentially coincide with 0optimizability.However these terms seem to be slightly ambiguous, as they can also refer to functions γ U * which are strictly increasing on any nontrivial interval instead of being piece-wise constant as in finite RSB.For example, the physics paper [CKP + 14] describes "the case where the function ∆(x) is allowed to have a continuous part: this can be thought as an appropriate limit of the k-RSB construction when k → ∞ and is therefore called

Branching IAMP and Spherical Spin Glasses
Under no overlap gap, one expects that any finite ultrametric space of diameter at most 2(1 − q) (with size independent of N ) can be realized by approximate maximizers of H N .In fact a modification of our q-IAMP algorithm is capable of explicitly producing such realizations.In Section 4 we give a branching q-IAMP algorithm which for any finite ultrametric space X and optimizable γ * constructs points (σ x ) x∈X such that HN (σx) N ≃ P(γ * ) and σx−σy 2 √ N ≃ d X (x, y) for each x, y ∈ X. Recall that an ultrametric space X is a metric space which satisfies the ultrametric triangle inequality Moreover any finite ultrametric can be canonically identified with the leaf set of a rooted tree, see e.g.[BD98].
The idea is to occasionally reset the IAMP part of the algorithm with external randomness.A similar strategy was proposed but not analyzed in [AM20].
Theorem 4. Let γ * ∈ L be optimizable, and fix a finite ultrametric space (X, d X ) with diameter at most 2(1 − q) as well as ε > 0. Then an efficient AMP algorithm constructs with probability tending to 1 as N → ∞.
In Section 5 we give corresponding results for spherical spin glasses, extending [Sub21] to the case of non-trivial external field.At zero temperature, [CS17, Theorem 1] determines the free energy in spherical spin glasses based on a positive, non-decreasing function α : [0, 1) → [0, ∞) as well as a constant L. (See also [JT17] for related results.)More precisely, they show the asymptotic ground state energy is given by the unique minimizer to the variational problem: (1.6) The associated definition of no overlap gap is as follows.
Definition 1.10.The spherical mixed p-spin model is said no overlap gap if for some q sph ∈ [0, 1), the unique minimizing α ∈ U in (1.6) is strictly increasing on [q sph , 1) and satisfies α(q) = 0 for q ≤ q sph .Unlike the Ising case, we do not formulate a generalized variational principle and only show how to achieve a natural energy value, which coincides with the ground state energy when no overlap gap holds by [CS17, Proposition 2].We also exactly characterize the spherical models exhibiting no overlap gap, which slightly extends the same result.
Remark 1.12.Subsequently to the present work and in collaboration with Brice Huang, we showed in [HS21] that the algorithms presented in this paper are optimal in some sense.More precisely, it is shown that Proposition 1.9 can be strengthed to say that for any ε > 0. Here, as in Proposition 1.9, σ ∈ [−1, 1] N is the output of an AMP algorithm whose number of iterations is independent of N .This result applies more generally to arbitrary algorithms with suitably Lipschitz dependence on the disorder variables defining H N .In the spherical case, [HS21] similarly shows that the energy attained in Theorem 6 is asymptotically best possible for Lipschitz algorithms; see Proposition 2.2 therein.The essential idea of [HS21] is to consider general finite ultrametric spaces (with size independent of N ) of points σ with large energy.They show that for E > inf γ∈L P(γ), the level sets do not contain approximately isometric embeddings of sufficiently complicated finite ultrametrics.(Technically proving (1.7) requires a more complicated obstruction involving a correlated family of different Hamiltonians.)Theorem 4 is a sharp converse to and was a key inspiration for this result, as it constructs arbitrary finite ultrametric configurations at energy P(γ * ) for optimizable γ * .See the introduction of [HS21] for further discussion and implications.

Technical Preliminaries
We will use ordinary lower-case letters for scalars (m, x, . . ., ) and bold lower-case for vectors (m, x).Ordinary upper-case letters are used for the state-evolution limits of AMP as in Proposition 2.3 such as (X δ j , Z δ j , N δ j ) as well as for continuous-time stochastic processes such as (X t , Z t , N t ).We denote limits in probability as N → ∞ by p-lim N →∞ (•).We write x ≃ y to indicate that p-lim N →∞ (x − y) = 0 where x, y are random scalars.
We will use the ordinary inner product x, y = N i=1 x i y i as well as the normalized inner product x, y N = N i=1 xiyi N . Here x = (x 1 , . . ., x N ) ∈ R N and similarly for y.Associated with these are the norms x = x, x and x N = x, x N .We will also use the . Often, for example in (2.3), we apply a scalar function f to a vector x ∈ R N .This will always mean that f is applied entrywise, i.e. f (x 1 , . . ., x N ) = (f (x 1 ), . . ., f (x N )).Similarly for a function f : R ℓ+1 → R, we define The following useful a priori estimate shows that all derivatives of H N have order 1 in the • N norm.Note that we do not apply any non-standard normalization in the definitions of gradients, Hessians, etc.We use • op to denote the operator norm of a tensor T ∈ (R N ) ⊗k of arbitrary order k: ). Fix a mixture function ξ, external field distribution L h , k ∈ Z + , η ∈ R + , and assume that the coefficients of ξ decay exponentially.Then for suitable C = C(ξ, L h , k, η),

Review of Approximate Message Passing
Here we review the general class of approximate message passing (AMP) algorithms.AMP algorithms are a flexible class of efficient algorithms based on a random matrix or, in our setting, mixed tensor.To specify an AMP algorithm, we fix a probability distribution p 0 on R with finite second moment and a sequence f 0 , f 1 , . . . of Lipschitz functions f ℓ : R ℓ+1 → R, with f −1 = 0.The functions f ℓ will often be referred to as non-linearities.We begin by taking (2.4) Here the non-linearity f ℓ is applied coordinate-wise as in (2.1).Moreover Z 0 ∼ p 0 while (Z ℓ ) ℓ≥1 is an independent centered Gaussian process with covariance Q ℓ,j = E[Z ℓ Z j ] defined recursively by The key property of AMP, stated below in Proposition 2.3, is that for any ℓ the empirical distribution of the N sequences (z The first version of state evolution was given for Gaussian random matrices in [Bol14,BM11].Since then it has been extended to more general situations in many works including [JM13, BLM15, BMN19, CL21, Fan22].As state evolution holds for essentially arbitrary non-linearities f ℓ , it allows a great deal of flexibility in solving problems involving random matrices or tensors. We remark that [AMS21, Proposition 3.1] is phrased in terms of a random mixed tensor W , i.e. a sequence of p-tensors (W (p) ∈ (R N ) ⊗p ) p≥2 -see Equation (3.2) therein.The two descriptions are equivalent because W is constructed so that p≥2 c p W (p) , x ⊗p = H N (x).While the tensor language is better suited to proving state evolution, for our purposes it is more convenient to express AMP just in terms of H N and ∇ H N .
Let us finally discuss the efficiency of our AMP algorithms.The algorithms we give are described by parameters q and ℓ and require oracle access to the function Φ γ * (t, x) and its derivatives.We do not address the complexity of computing Φ γ * (t, x).However as stated in [Mon19,AMS21] it seems unlikely to present a major obstacle because solving for γ U * is a convex problem which only must be solved once for each (ξ, L h ).Moreover [AM20] demonstrates that these algorithms are practical to implement.
In the end, our algorithms output rounded points σ with σ i = sign(f ℓ (z 0 i , . . ., z ℓ i )) for a large value ℓ = ℓ(q, ℓ).The outputs satisfy lim for some asymptotic energy value H * .To achieve an ε-approximation to the value H * , the parameters q and ℓ must be sent to 1 and ∞ which requires a diverging number of iterations.In particular let χ denote the complexity of computing ∇ H N at a point and let χ 1 denote the complexity of computing a single coordinate of ∇ H N at a point.Then the total complexity needed to achieve energy When ξ is a polynomial this complexity is linear in the size of the input specifying H N -see the comments following [AMS21, Remark 2.1].In the statements of our results, we refer to such algorithms as "efficient AMP algorithms".

Initializing AMP
Here we explain some technical points involved in initializing our AMP algorithms and why they arise.First, we would like to use a random external field h i which varies from coordinate to coordinate.In the most natural AMP implementation, this requires that the non-linearities f ℓ correspondingly depend on the coordinate rather than being fixed, which is not allowed in state evolution.Second we would like to use many i.i.d.Gaussian vectors throughout the branching version of the algorithm.However Proposition 2.3 allows only a single initial vector z 0 as a source of external randomness independent of H N .One could prove a suitable generalization of Proposition 2.3, but we instead build these additional vectors into the initialization of the AMP algorithm as a sort of preprocessing phase.To indicate that our constructions here are preparation for the "real algorithm", we reparametrize so the preparatory iterates have negative index.We begin by taking p 0 = L h to be the distribution of the external field itself, and initialize for some constant c > 0 which the algorithm is free to choose.(Note that the functions f −k correspond to entry-wise applications of the form in (2.1).)State evolution immediately implies the following Proposition 2.4.In the state evolution N → ∞ limit, the empirical distribution of In fact taking K = 1 suffices for the main construction in the paper.In Section 4 we require larger values of K for branching IAMP, where the iterates (z −K+1 , . . ., z −1 ) serve as proxies for i.i.d.Gaussian vectors.
Remark 2.5.Because the sum defining the Onsager correction term in (2.3) starts at j = 1, the effect of the external field h i on future AMP iterates does not enter into any Onsager correction terms in this paper.

Properties of the Parisi PDE and SDE
Quite a lot is known about the solution Φ γ to the Parisi PDE.The next results hold for any γ ∈ L and are taken from [AMS21].Similar results for γ ∈ U appear in [AC15,JT16].
Lemma 2.9.If γ * ∈ L is q-optimizable then it satisfies: , t ≥ q, (2.7) Remark 2.10.We expect (2.8) to hold with equality; if this is true, then our analysis in Subsection 3.3 shows that Theorem 3 holds as a two-sided estimate.Conversely, the main result of [HS21] implies such a two-sided estimate when ξ is even; retracing Subsection 3.3 then implies (2.8) is indeed an equality in such cases.However this is unsatisfyingly indirect and it would be interesting to give a direct proof.

The Main Algorithm
In this section we explain our main AMP algorithm and prove Theorem 3. Throughout we take γ * ∈ L to be q-optimizable for q = inf(supp(γ * )) ∈ [0, 1).

Phase 1: Finding the Root
Here we give the first phase of the algorithm, which proceeds for a large constant number ℓ of iterations after initialization and approximately converges to a fixed point.The AMP iterates during this first phase are denoted by (w k ) −K≤k≤ℓ .We rely on the function and use non-linearities ) is evaluated entrywise as explained in (2.1).)Proposition 2.6 implies that each f k is Lipschitz, so that state evolution applies to the AMP iterates.In the initialization phase we take c = ξ ′ (q) as described in Subsection 2.2, so that the coordinates w 0 i are asymptotically distributed as centered Gaussians with variance ξ ′ (q) in the N → ∞ limit.Moreover we set m k = f (x k ) where x k = w k + h.This yields the following iteration.
Lemma 3.1.For f as defined above, h ∼ L h and Z ∼ N(0, 1) an independent standard Gaussian, Proof.The identities follow by taking t = q in the definition of optimizability as well as Lemma 2.9.Here we use the fact that Next with (Z, Z ′ , Z ′′ ) ∼ N(0, I 3 ) independent of h ∼ L h , define for t ≤ ξ ′ (q) the function Lemma 3.2.The function ψ is strictly increasing and strictly convex on [0, ξ ′ (q)].Moreover Finally ψ(t) > t for all t < ξ ′ (q).
Proof.Using Gaussian integration by parts as in [Bol14, Lemma 2.2], we find These expressions are each strictly positive, as the optimizability of γ * implies that f ′ , f ′′ are not identically zero.Therefore φ is increasing and convex.Since ξ ′ is also increasing and convex (being a power series with non-negative coefficients) we conclude the same about their composition ψ.The values ψ(ξ ′ (q)) = ξ ′ (q) and ψ ′ (ξ ′ (q)) = 1 follow from Lemma 3.1 and the chain rule.Finally the last claim follows by strict convexity of ψ and ψ ′ (ξ ′ (q)) = 1.
Lemma 3.3.For all non-negative integers 0 ≤ j < k, the following equalities hold: (3.9) Proof.We proceed by induction on j, first showing (3.6) and (3.8) together.As a base case, (3.6) holds for j = 0 by initialization.For the inductive step, assume first that (3.6) holds for j.Then state evolution and (3.5) yield so that (3.6) implies (3.8) for each j ≥ 0. On the other hand, state evolution directly implies that if (3.8) holds for j then (3.6) holds for j + 1.This establishes (3.6) and (3.8) for all j ≥ 0. We similarly show (3.7) and (3.9) together by induction, beginning with (3.7) when j = 0.By the initialization of Subsection 2.2 it follows that the random variables h, W −1 , W 0 are jointly independent.State evolution implies that W k−1 is independent of W −1 for any k ≥ 0. Then state evolution yields for any k ≥ 1: Just as above, it follows from state evolution that (3.7) for (j, k) implies (3.9) for (j, k) which in turn implies (3.7) for (j + 1, k + 1).Hence induction on j proves (3.7) and (3.9) for all (j, k).Finally the last independence assertion is immediate from state evolution just because h is the first step in the AMP iteration.
We now compute the limiting energy from the first phase.Since the first phase is similar to many "standard" AMP algorithms, this step is comparable to the computation of their final objective value, for example [DAM17, Lemma 6.3].This computation is straightforward when H N is a homogeneous polynomial of degree p, because one can just rearrange the equation for an AMP iteration to solve for However it requires more work in our setting because ∇ H N acts differently on terms of different degrees.We circumvent this mismatch by applying state evolution to a t-dependent auxiliary AMP step and integrating in t.
Lemma 3.5.With X t the Parisi SDE (1.1), Proof.The equivalence of the latter two expressions follows from the fact that X q ∼ X 0 + N(0, ξ ′ (q)) so we focus on the first equality.Observe that holds for any vector m k by considering each monomial term of H N .Our main task now reduces to computing the in-probability limit of the integrand as a function of t.Proposition 2.1 ensures that t → m k , ∇ H N (tm k ) N is Lipschitz assuming m k N ≤ 1+o(1).This holds with high probability for each k as N → ∞ by state evolution and Proposition 2.7, so we may freely interchange the limit in probability with the integral.
To compute the integrand m k , ∇ H N (tm k ) N we analyze a modified AMP which agrees with the AMP we have considered so far up to step k, whereupon we replace the non-linearity for arbitrary t ∈ (0, 1).We obtain the new iterate Rearranging yields We evaluate the N → ∞ limit in probability of the first term, via the state evolution limits W k , X k , Y k+1 (t).State evolution directly implies Since h is independent of (W k , Y k+1 ) we use Gaussian integration by parts to derive Integrating with respect to t yields Finally the first term in (3.10) gives energy contribution Since lim k→∞ a k−1 = ξ ′ (q) and ψ(ξ ′ (q)) = ξ ′ (q) combining concludes the proof.

Phase 2: Incremental AMP
We now switch to IAMP, which has a more complicated definition.We will begin from the iterates x ℓ , m ℓ from phase 1 for a large ℓ ∈ Z + .Our implementation follows that of [AMS21, AS20] and we relegate several proofs to Section 8. First define the functions 1 Define the sequence (q δ ℓ ) ℓ≥ℓ by q δ ℓ = q + (ℓ − ℓ)δ.Fix q ∈ (q, 1); the value q will be taken close to 1 after sending ℓ → ∞.In particular we will assume δ < 1 − q holds and set ℓ = min{ℓ ∈ Z + : q δ ℓ ≥ q}.Also define Set z ℓ = w ℓ .So far, we have defined (x ℓ , z ℓ , n ℓ ).We turn to inductively defining the triples (x ℓ , z ℓ , n ℓ ) for ℓ ≤ ℓ ≤ ℓ.First, the values (z ℓ ) ℓ≥ℓ are defined as AMP iterates via (3.11) (The non-linearities f ℓ will be specified below).The sequence (x ℓ+1 ) ℓ≥ℓ is defined by As usual, v(q δ j , •) is applied component-wise so that v(q δ j , x j ) i = v(q δ j , x j i ).Next define the scalar function and consider for ℓ ≥ ℓ the recursive definition We define the non-linearity f ℓ : R ℓ+1 → R to recursively satisfy It is not difficult to verify that the equations above form a "closed loop" uniquely determining the sequence (x ℓ , z ℓ , n ℓ ) ℓ≥ℓ .Since (x ℓ i , n ℓ i ) is determined by the sequence (z ℓ i , . . ., z ℓ i ) we may define the state evolution limiting random variables (X δ ℓ , N δ ℓ , Z δ ℓ ) ℓ≥ℓ .We emphasize that the IAMP just defined is part of the same q-AMP algorithm as the first phase defined in the previous subsection.However the variable naming has changed so that the main iterates are z ℓ for ℓ ≥ ℓ rather than w ℓ for ℓ ≤ ℓ.In particular there is no problem in applying state evolution even though the two AMP phases take different forms.
To complete the algorithm, we output the coordinate-wise sign σ = sign(n ℓ ) where The key to analyzing the AMP algorithm above is an SDE description in the δ → 0 limit.Define the filtration for the state evolution limiting process.

Computing the Final Energy
In this subsection we establish Theorem 3 by showing lim q→1 lim ℓ→∞ p-lim N →∞ HN (σ) N = P(γ * ).First we show that the replacements m ℓ → n ℓ and n ℓ → σ have negligible effect on the asymptotic value attained.
In the next lemma, proved in Section 8, we compute the energy gain during IAMP.
We now put everything together.Recall from Lemma 3.5 that Proposition 2.8 implies that the process ∂ x Φ γ * (t, X t ) is a martingale, while Lemma 2.9 states that E[u(t, Using again that E[u(t, X t )] = 1 t γ * (s)ds, the right-hand side of (3.16) is lim Combining with Lemma 3.8 yields (3.17) Having estimated the limiting energy achieved by our q-AMP algorithm, it remains to verify that the value in Equation (3.17 We also use some identities computed in [AMS21]. Proposition 3.10 ([AMS21, Lemma 6.13]).For any γ ∈ L , the following identities hold: Lemma 3.11.For h ∼ L h , q ≥ q, and X t as in (1.1), Proof.We write Rearranging shows: As X 0 = h this concludes the proof.

Constructing Many Approximate Maximizers
Here we explain the modifications needed for branching IAMP and Theorem 4. The proofs are a slight extension of those for the main algorithm, and in fact we give many proofs for IAMP directly in this more general setting in Section 8. Let us fix values Q = (q 1 , . . ., q m ) with q ≤ q 1 < • • • < q m < 1 and an index B ∈ [m].To construct a pair of approximate maximizers with overlap q B we first construct n ℓ exactly as in Subsection 3.1.For each i < B, set g (qi,1) = g (qi,2) = z −ki,1 = z −ki,2 ∈ R N for some k i,1 = k i,2 ≤ K as in Subsection 2.2.
For each B ≤ i ≤ m, set g (qi,1) = z −ki,1 and g (qi,2) = z −ki,2 where k i,1 = k i,2 .Because the vectors g (qi,a) are constructed using AMP, we require some additional conditions.We impose the separation condition for all i > j and a, a ′ ∈ {1, 2}.(In particular, it implies that k i,a ′ = k j,a for i = j.)It is not hard to satisfy (4.1) by choosing the values k i,a sequentially in increasing order of i. Finally we insist that max i,a (k i,a ) + ℓ + 1 < K, where h = z −K was the AMP initialization, which is satisfied by choosing K large at the end.
Having fixed this setup, we proceed by defining m k,1 = m k,2 = m k for k ≥ 0 exactly as in the original first phase.Next we generate two sequences of IAMP iterates using (3.12) except at times corresponding to q i ∈ Q, altogether generating n ℓ,a for ℓ > ℓ and a ∈ {1, 2} via: 2) Recalling Subsection 2.2, this is an AMP algorithm of the required form.The definitions of x ℓ,a , z ℓ,a are the same as in e.g.(3.11) with superscript a everywhere, though note that now the definition f j−1 (z −K , . . ., z 0 , z 1,a , . . ., z j−1,a ) = n j−1,a of f j−1 has explicit dependence on the negatively indexed variables through g (qi,a) .The following result follows immediately from Lemmas 8.5, 8.6 and readily implies Theorem 4.
Next we extend this construction to general finite ultrametric spaces X. Recall that any finite ultrametric space X with all pairwise distances in the set { 2(1 − q i )} i∈[m] can be identified with a rooted tree T whose leaves ∂T are in bijection with X, and so that d X (x i , x j ) = 2(1 − q k ) is equivalent to leaves i, j having least common ancestor at depth k.
Given such T , we may assign to each non-root vertex u ∈ T a distinct initialization iterate g (u) = z −ku .We require that 1.The k u are pairwise distinct.

Analogously to (4.1),
Again, these conditions are easy to satisfy by choosing k u in increasing order of depth(u) with ties broken arbitrarily.Then for each x ∈ X, we first construct m k,x = m k for k ≥ 0 exactly as in the original first phase, which does not depend on x.Next, denoting by root = v 0 , v 1 , . . ., v m = x ∈ ∂T = X the root-to-leaf path for x within T , we compute the analog of (4.2): Again, x ℓ,x , z ℓ,x are defined using the same recursions as before.
Theorem 4. Let γ * ∈ L be optimizable, and fix a finite ultrametric space (X, d X ) with diameter at most 2(1 − q) as well as ε > 0. Then an efficient AMP algorithm constructs with probability tending to 1 as N → ∞.
Proof.It is easy to see that for each distinct x, y ∈ X, the behavior of the pair n ℓ,x , n ℓ,y in (4.3) is identical to n ℓ,1 , n ℓ,2 in (4.2) (e.g. both pairs have the same joint law with H N ).
Applying Lemma 4.1 to all such pairs, we find that the iterates n ℓ,x satisfy HN (n ℓ,x ) N ≥ P(γ * ) − ε and n ℓ,x , n ℓ,y N ≃ q j if d X (x, y) = 2(1 − q j ).The conclusion follows by rounding n ℓ,x → σ x ∈ Σ N for each x ∈ X as in the main algorithm.
We remark that our construction differs from the one proposed in [AM20] only because we construct the vectors g (u) using AMP rather than taking them to be literally independent Gaussian vectors.While the latter construction almost certainly works as well, the analysis seems to require a more general version of state evolution.

Spherical Models
We now consider the case of spherical spin glasses with external field.The law of the Hamiltonian H N is specified according to the same formula as before depending on (ξ, L), however the state space is the sphere S N −1 ( √ N ) instead of the hypercube.The free energy in this case is given by a similar Parisi-type formula, however it turns out to dramatically simplify under no overlap gap so we do not give the general formula.At positive temperature the spherical free energy was computed non-rigorously in [CS92] and rigorously in [Tal06a,Che13], but we rely on [CS17] which directly treats the zero-temperature setting.
Remark 5.1.Due to rotational invariance, for spherical models all that matters about L h is the squared L 2 norm E h∼L h [h 2 ].In particular unlike the Ising case there is no loss of generality in assuming h is constant.We continue to work with coordinates h i sampled i.i.d.from L h and implicitly use this observation when interpreting the results of [CS17].
Note that when h = 0 almost surely it follows that q sph = 0, which is the setting of [Sub21].Generate initial iterates (w −K sph , . . ., w 0 sph ) as in Subsection 2.2.For non-zero h we take c = E[h 2 ] + ξ ′ (q sph ) so that Generate further iterates via the following AMP iteration.
The next lemma is the spherical analog of Lemmas 3.3, 3.4, 3.5 -the proof is similar to the Ising case and is given in the next subsection.
Lemma 5.2.Using the AMP of (5.1), the asymptotic overlaps and energies satisfy (5.2) Proof of Theorem 6.The latter two parts of Lemma 5.2 directly imply Theorem 6 in the case that E[h 2 ] + ξ ′ (1) ≥ ξ ′′ (1) (recall q sph = 1 in this case).Indeed, it suffices to take for a sufficiently large constant ℓ.When E[h 2 ]+ξ ′ (1) < ξ ′′ (1), we can conclude by mimicking the IAMP phase using the simple non-linearities u(t, x) = u(t) = ξ ′′ (t) −1/2 and v(t, x) = 0 -see also [AMS21, Remark 2.2].Lemma 3.9 then shows the energy gain from IAMP is As in the Ising case, we may start IAMP from m = m k for a large constant k.Combining with (5.2) and defining σ sph via (5.3) with m ℓ an IAMP iterate, we obtain p-lim Alternatively to IAMP, in the spherical setting it is possible to use the approach of [Sub21].Indeed [Sub21, Theorem 4] immediately extends to an algorithm taking in an arbitrary point m with m N ≤ 1 and outputting a point m * ∈ S N −1 ( √ N (which may depend on with probability 1 − o N →∞ (1) for any desired ε > 0. Either approach completes the proof of Theorem 6.
Lemma 5.3.ψ sph is strictly increasing and convex on [0, ξ ′ (q sph )] and (5.4) (5.6) Proof.Since ξ ′ is strictly increasing and convex and φ sph is affine and increasing, it follows that ψ sph is strictly increasing and convex.(5.4) is equivalent to the equation q sph ξ ′′ (q sph ) = E[h 2 ] + ξ ′ (q sph ) defining q sph .To show (5.5) we use the chain rule to write Equations (5.4) and (5.5) and the convexity of ψ sph just shown imply (5.6)Let h, W −1 sph , (W j sph , X j sph , M j sph ) k j=0 be the state evolution limit of the coordinates of Lemma 5.4.For all non-negative integers 0 ≤ j < k the following equalities hold: Proof.Follows from state evolution and induction exactly as in Lemma 3.3.
Proof.As in the proof of Lemma 3.4, the sequence b 1 , b 2 , . . ., must converge up to a limit, and this limit must be a fixed point for ψ sph , implying the first claim.The second claim follows by continuity of φ sph .
Proof.We use again the identity and interchange the limit in probability with the integral.To compute the main term p-lim N →∞ m k sph , ∇ H N ((tm k sph ) we introduce an auxiliary AMP step Rearranging yields For the first term, Gaussian integration by parts with Integrating with respect to t, we find Finally the first term gives energy contribution Since lim k→∞ b k−1 = ξ ′ (q sph ) and ψ sph (ξ ′ (q sph )) = ξ ′ (q sph ) we conclude Proof of Lemma 5.2.The result follows from the preceding lemmas.

Proof of Theorem 5
It follows from our algorithm that GS sph (ξ, L h ) ≥ q sph ξ ′′ (q sph ) 1/2 + 1 q sph ξ ′′ (q sph ) 1/2 dq.We now characterize the models in which equality holds, which coincide with those exhibiting no overlap gap.Moreover we give an alternate proof of the lower bound for GS(ξ, L h ) sph which shows that equality holds exactly in no overlap gap models.We thank Wei-Kuo Chen for communicating the latter proof.
Proof.We use the results and notation of [CS17].If ξ ′′ (q) −1/2 is concave on [q sph , 1] then the proof of Proposition 2 in [CS17] applies verbatim to show that the support of α is [q sph , 1].

Impossibility of Approximate Maximization Under an Overlap Gap
Here we explain the modifications of [GJ21, GJW20] needed to establish Proposition 1.9.Throughout this section we assume that ξ(t) = p∈{2,4,...,2P } c 2 p t p is an even polynomial and that the external field (h, h, . . ., h) is constant and deterministic.We take q = inf(supp(γ U * )) and let H N,0 , H N,1 be i.i.d.copies of H N and for t ∈ Definition 6.1.The model (ξ, h) satisfies the path overlap gap property with parameters ε > 0 and 0 < ν 1 < ν 2 < 1 if the following holds with probability at least 1 − e −KN K for some K = K(ξ, h) > 0. For every pair H s , H t ∈ H N and every σ s , σ t satisfying Definition 6.2.The pair (H, H ′ ) of Hamiltonians are ν-separated above µ if for any x, y ∈ Σ N with H(x) ≥ µ, H ′ (y) ≥ µ, it holds that | x, y N | ≤ ν.

E[(∂
holds for some t ∈ [0, 1) where γ * = γ U * .The remainder of the proof (just below [CGPR19, Lemma 5.4]) is fully general and we give an outline below.The point is that by (6.1) must hold in some non-empty open subset of (0, 1), thus in a non-empty interval [a, b].For each t ∈ [a, b], one considers the Hamiltonian HN (σ)+HN (σ ′ ) 2 on two-replica configurations (σ, σ ′ ) with overlap constraint σ, σ ′ N = t.The free energy of this constrained system can be upper-bounded using an interpolation argument; the relevant Parisi order parameter γ must increase, except that it may decrease by a factor of at most 2 at t, i.e. it only needs to satisfy lim Taking γ = γ U * recovers the single-replica value.However when (6.1) holds, γ = γ U * is no longer locally optimal since γ lives in a larger function space due to the relaxation (6.2).Hence the constrained two-replica ground state energy is strictly smaller.This argument can be applied for all O(N ) values t ∈ [a, b] ∩ Z/N , yielding the result.Lemma 6.5.If (ξ, h) is not optimizable, then there exists ε(ξ, h) > 0 and 0 ≤ ν 1 < ν 2 ≤ 1 and K > 0 such that with probability at least 1 − e −KN N : 1.The model (ξ, h) satisfies the path overlap gap property with parameters (ε, ν 1 , ν 2 ).

H
Proof.The proof is identical to [GJ21, Theorem 3.4].In short, one discretizes H N into {H N,kδ : 0 ≤ k ≤ δ −1 } for some small δ > 0 using Proposition 2.1 and then applies Lemma 6.3 to control the values H N,s (σ s ), H N,t (σ t ) for s = t and Lemma 6.4 to control the cases that s = t.Indeed in the proof of [GJ21, Theorem 3.4], the former is accomplished using [CGPR19, Theorem 3] while the latter is accomplished using [CHL18, Theorem 2].The preceding lemmas exactly generalize the relevant statements to non-optimizable models.
Proof of Proposition 1.9.Given Lemma 6.5, the proof is identical to that of [GJ21, Theorem 3.3].Indeed, that proof does not depend on ξ.The main input is [GJ21, Conjecture 3.2].This is shown to be implied by the combination of [GJ21, Theorem 3.4] and [GJ21, Conjecture 3.6].Lemma 6.5 above suitably extends the former, while the latter (for general (ξ, h)) is the main result of our subsequent work [Sel21].The proof of [GJ21, Theorem 3.3] also uses [GJ21, Theorem 4.2 and 6.1]; these follow from general concentration of measure results on Gaussian space and easily extend to general ξ.
We remark that Lemma 6.5 is also the only property of pure even p-spin models used in [GJW20, Theorem 2.4] to rule out approximate maximization (in a slightly weaker sense) by constant degree polynomials.Therefore their result also applies under the more general conditions of Proposition 1.9.
7 Proof of Lemmas 1.4, 1.5 and 2.9 We first recall several existing results.
Proof.The proof is identical to [AC15, Theorem 2] and [CHL18, Lemma 5] which show strict convexity on U .

Throughout this section we let γ
Lq * be the minimizer of P over L q , assuming it exists.
Note that we will eventually show in Lemma 1.4 that γ Lq * = γ L * if either minimizer exists.
Proof.By Lemma 7.4, [q, 1) \ supp(γ Lq * ) is a countable union of disjoint intervals, open in [q, 1).First assume that at least one of these intervals is of the form (t 1 , t 2 ) with q ≤ t 1 < t 2 < 1.By Lemma 7.6 and Corollary 7.7 we know that Further, for t ∈ (t 1 , t 2 ), Φ γ L q * solves the PDE which is simply the heat equation up to a time change.We therefore obtain Differentiating this equation and using dominated convergence (recall that ∂ xx Φ γ L q * (t 2 , x) is bounded by Proposition 2.6), we obtain Because dX t = ξ ′′ (t) dB t , we can rewrite the last equation as By Jensen's inequality, where in the last step we used Eq.(7.4).Using Corollary 7.1 we get, for t where in the last step we used the fact that t → ξ ′′ (t) is increasing.The last equation is in contradiction with Eq. (7.5), and therefore [q, 1) \ supp(γ Lq * ) is either empty or consists of a single interval (t 1 , 1).
In the next lemma, we show that minimization of P over L subsumes minimization over L q .A priori, one might expect that tuning the value of q could lead to many different minima.
Proof.Using [AMS21, Proposition 6.1(c)] and continuity, we have ) ≤ C γ − γ (ε) for a constant C depending only on ξ.The right-hand side tends to zero as ε → 0 by the definition of L .
Lemma 7.11.Suppose γ First we show that f (t) ≥ t for all 0 ≤ t ≤ q.Recall that X t is simply a time-changed Brownian motion on 0 ≤ t ≤ q while Φ γ L q * solves the time-changed heat equation on the same time interval, therefore (q, X q )].By Jensen's inequality, it follows that for all 0 ≤ t ≤ q, .
In the last line we used that ξ ′′ is increasing as ξ is a power series with non-negative coefficients.Next, from Lemma 7.6 and Lemma 7.9 it follows that f (q) = q.In light of Corollary 7.1, we showed just above that f ′ (t) ≤ 1 for t ≤ q.It now follows that f (t) ≥ t for all 0 ≤ t ≤ q.
We now restate and prove Lemmas 1.4 and 1.5.
Moreover if a minimizer exists in either variational problem just above, then it is unique.
Proof.Lemma 7.5 immediately implies uniqueness of minimizers.The second statement immediately implies the third, while Lemma 7.11 provides the converse result.To show that the first statement implies the third, we observe that Proposition 7.3 immediately yields d ds P((1 − s)γ * + sγ)| s=0+ = 0 for any γ ∈ L q when γ * is optimizable; this implies the third statement by again invoking Lemma 7.5.It only remains to show that if P(γ * ) = inf γ∈L P(γ), then γ * is q-optimizable, which follows from Lemmas 7.6 and 7.9.
Finally we turn to Lemma 2.9.
Proof of Lemma 2.9.First, (2.7) is clear given Corollary 7.1.To establish (2.8), we first show that (7.7) Indeed (7.7) follows by using Ito's formula to derive and taking the second derivative with respect to x of the Parisi PDE to obtain In particular (7.7) implies that for all t ∈ [0, 1), Therefore to show (2.8) it suffices to show lim t→1 E[∂ xx Φ γ * (t, X t )] ≥ 0, but this is clear by convexity of Φ γ * (t, •).

Incremental AMP Proofs
We will prove Lemma 8.3 which generalizes Lemma 3.6 to the setting of branching AMP and describes the limiting Gaussian processes N δ ℓ,a , Z δ ℓ,a .We recall the setup of Section 4 and in particular continue to use the value q B ∈ (q, 1) to define the time ℓ δ qB at which ,2 last holds.For the branching setting we slightly generalize the filtration (3.13) to . Crucially note that we restrict here to k ≥ 0, i.e. we do not include the preparatory iterates with negative index.We remark that if we consider all the IAMP iterates (Z δ ℓ,a , N δ ℓ,a ) together in the linear order given by (ℓ, a) → 2ℓ + a, then these are iterates of a standard AMP algorithm since each iterate depends only on the previous ones.(This is because in (2.4) the last expectation is zero when f ℓ does not depend on Z j , which is the case if f ℓ and Z j correspond to different values of a after the two branches separate.)Moreover it is easy to see that the Onsager correction terms are not affected by this rewriting.Therefore we may continue to use state evolution in the natural way even though we do not think of the iterates as actually being totally ordered.
Proof.We proceed by induction over ℓ, the base case ℓ = 0 following from Proposition 2.4.Because the random variables Z ℓ k,a form a Gaussian process it suffices to verify that E Z δ ℓ,a Z −j = 0 holds whenever j > J ℓ .By state evolution, By definition N δ ℓ−1,a is F δ ℓ−1 -measurable.Since ξ ′ (0) = 0 it suffices to show that F δ ℓ−1 is independent of Z −j−1 .By the inductive hypothesis, this holds if j + 1 > J ℓ−1 .This in turn follows from the easy-to-verify fact that J ℓ − 1 ≥ J ℓ−1 , completing the proof.
Corollary 8.2.Let G δ qj ,a be the state evolution limit of g (qj ,a) for each (j, a) Proof.Since k i,1 = k i,2 it follows from Proposition 2.4 that (G δ qi,1 , G δ qi,2 ) ∼ N(0, I 2 ) holds as an unconditional law.Since we chose the values k i,a such that k i,a − ℓ δ qi > k j,a ′ − ℓ δ qj > 0 for any i > j and a, a ′ ∈ {1, 2}, it follows that k i,a > J ℓ δ q i −1 .Applying Lemma 8.1 now concludes the proof.
Base Case for Equations (8.1), (8.2), (8.5), (8.6).Note that here, none of the perturbations g (qi,a) appear yet.We begin with Equation (8.1): To see that the above expression vanishes, it suffices to show that This follows since we just showed E[Z δ ℓ+1,a |Z δ ℓ,a ] = Z δ ℓ,a and we have Next we verify the base case for Equation (8.2).Using the base case of Equation (8.1) in the first step we compute: Continuing, we verify the base case for Equation (8.5).First note that The last line holds because X δ ℓ,a is F δ ℓ -measurable and E[Z δ ℓ+1,a − Z δ ℓ,a |F δ ℓ ] = 0 as deduced above.Finally for Equation (8.6) using the martingale property again we obtain: Here the second line follows from the definition of u δ ℓ , and we can multiply the two expectations because E[(Z δ ℓ+1,a − Z δ ℓ,a ) 2 |F δ ℓ,a ] is constant while the other term is F δ ℓ,a measurable.
We continue to Equation (8.2).Using Equation (8.1) just proven in the first step we get Next we show Equation (8.5) continues to hold.If ℓ + 1 = ℓ δ qi for some i ∈ [m] again follows from Corollary 8.2.When ℓ + 1 = ℓ δ qi for all i, it follows from the definition of the sequence N δ ℓ,a and the just proven fact that (Z δ ℓ,a ) ℓ≥ℓ+1 forms a martingale sequence.Finally we show Equation (8.6) continues to hold inductively.Again for ℓ + 1 = ℓ δ qi it follows from Corollary 8.2, and otherwise by definition .

Diffusive Scaling Limit
We begin with the following slight generalization of Lemma 3.7 which allows for the additional perturbation steps of branching IAMP but still considers only a single sample path.
Lemma 8.4.Fix q ∈ (q, 1) and an index a.There exists a coupling between the families of triples {(Z δ ℓ,a , X δ ℓ,a , N δ ℓ,a )} ℓ≥0 and {(Z t , X t , N t )} t≥0 such that the following holds for a constant C > 0. For large enough ℓ, and every ℓ ≥ ℓ with q ℓ ≤ q, max ℓ≤j≤ℓ E X δ j,a − X qj 2 ≤ Cδ, (8.9) Proof.We prove the scaling limits for X δ ℓ and N δ ℓ separately, inducting over ℓ in each proof.We suppress the index a as it is irrevelant.

Scaling limit for X δ ℓ
We begin by checking the claim for ℓ = ℓ.Recalling that We continue using a standard self-bounding argument.Let ℓ ≥ ℓ + 1 such that q ℓ ≤ q.Define ∆ X ℓ = X δ ℓ − X q ℓ .Then ∆ The first term just above is at most C q δ j q δ j−1 |X δ j −X t |dt since v is Lipschitz in space uniformly for t ∈ [0, 1].For the second term we estimate where the last inequality follows from the strong total variation of v. Combining the bounds and summing over j, we find Squaring and taking expectations, The middle term is proportional to (ℓ − ℓ) 2 δ 3 .Using (ℓ − ℓ)δ ≤ 1 we obtain that for δ smaller than an absolute constant, it holds that for a different absolute constant C.This implies E (∆ X ℓ ) 2 ≤ Cδ as desired.