The $\ell^p$-Gaussian-Grothendieck problem with vector spins

We study the vector spin generalization of the $\ell^p$-Gaussian-Grothendieck problem. In other words, given integer $\kappa\geq 1$, we investigate the asymptotic behaviour of the ground state energy associated with the Sherrington-Kirkpatrick Hamiltonian indexed by vector spin configurations in the unit $\ell^p$-ball. The ranges $1\leq p\leq 2$ and $2<p<\infty$ exhibit significantly different behaviours. When $1\leq p\leq 2$, the vector spin generalization of the $\ell^p$-Gaussian-Grothendieck problem agrees with its scalar counterpart. In particular, its re-scaled limit is proportional to some norm of a standard Gaussian random variable. On the other hand, for $2<p<\infty$ the re-scaled limit of the $\ell^p$-Gaussian-Grothendieck problem with vector spins is given by a Parisi-type variational formula.


Introduction and main results
Given an N × N matrix A = (a i j ) and some 1 ≤ p < ∞, the ℓ p -Grothendieck problem consists in maximizing the quadratic form ∑ N i, j=1 a i j σ i σ j over all vectors σ σ σ = (σ 1 , . . . , σ N ) ∈ R N with unit ℓ p -norm, σ σ σ p p = ∑ N i=1 |σ i | p = 1. In other words, it involves computing the quantity In the case p = 2, this is the maximum eigenvalue of the symmetric matrix (A + A T )/2. On the other hand, the limiting case p = ∞ has been extensively studied in the mathematics and computer science literature for its applications to combinatorial optimization, graph theory and correlation clustering [1,5,10,14,26]. The range 2 < p < ∞ can be thought of as an interpolation between the spectral and the correlation clustering problems [21], while the range 1 < p < 2 seems to be unexplored in the literature. Finding an efficient algorithm to solve the ℓ p -Grothendieck problem whenever p = 2 is generally difficult [18,20,24,25,27], so it is natural to study the ℓ p -Grothendieck problem for random input matrices first. In fact, it should help understand the typical behaviour of (1). This leads to the ℓ p -Gaussian-Grothendieck problem, where (g i j ) are independent standard Gaussian random variables.
The asymptotic behaviour of (2) was studied in great detail in [6]. It was discovered that the re-scaled limit of (2) exhibits significantly different behaviour in the ranges 1 ≤ p ≤ 2 and 2 < p < ∞; in the former, it is proportional to some norm of a Gaussian random variable, and in the latter, it is given by a Parisi-type variational formula. In this paper, we will show that this behaviour persists in the vector spin generalization of (2). Our work is motivated and greatly influenced by [6]; however, new ideas are needed to treat the range 2 < p < ∞. These will be detailed at a later stage, and they will allow us to avoid the key truncation step in [6] as well as its associated technicalities. Therefore, specializing our arguments to the scalar setting, κ = 1, yields a simpler proof of the main result in [6].
Before we describe the vector spin analogue of (2), let us mention that another motivation for investigating this optimization problem comes from the study of spin glass models. In the language of statistical physics, the quadratic form ∑ N i, j=1 g i j σ i σ j is known as the Hamiltonian of the Sherrington-Kirkpatrick (SK) mean-field spin glass model, and the quantity (2) corresponds to the ground state energy of the SK model on the unit ℓ p -sphere. From this perspective, the vector spin generalization of (2) which we will study in this paper is very natural; it also appears in the computer-science literature [1,5,14,19,21] when studying the convex relaxation of (2).
Let us now describe the vector spin generalization of (2) using the notation and terminology of statistical physics. Fix an integer κ ≥ 1 throughout the remainder of this paper. The Hamiltonian of the vector spin SK model is the random function of the N ≥ 1 vector spins taking values in R κ , given by the quadratic form where the interaction parameters (g i j ) are independent standard Gaussian random variables and (·, ·) is the Euclidean inner product on R κ . Denote the coordinates of each spin #» σ i by #» σ i = (σ i (1), . . ., σ i (κ)) ∈ R κ , write the configuration of the k'th coordinates as σ σ σ (k) = (σ 1 (k), . . ., σ N (k)) ∈ R N , and introduce the ℓ p,2 -norm on the Euclidean space (R κ ) N , The ℓ p -Gaussian-Grothendieck problem with vector spins consists in maximizing the Hamiltonian (4) over the unit ℓ p,2 -sphere. In other words, it involves computing the quantity To handle the range 1 ≤ p ≤ 2, we will use the Gaussian Hilbert space approach to the Grothendieck inequality [1,5,14,21] in order to show that for any N × N matrix A = (a i j ), This identity was mentioned in [19], but no proof was given. Combining (9) with theorem 1.1 and theorem 1.2 in [6] will immediately give our main result for 1 ≤ p ≤ 2.
Theorem 1.1. If 1 < p < 2, then almost surely, where p * is the Hölder conjugate of p and g is a standard Gaussian random variable. On the other hand, if p = 1 or p = 2, then almost surely, The range 2 < p < ∞ will require substantially more work, and will occupy the majority of this paper. It will be convenient to introduce a re-scaled version of the Hamiltonian (4), as well as a normalized version of the ℓ p,2 -norm (7), If we denote the classical SK Hamiltonian on R N by we may express the vector spin Hamiltonian (12) as It is readily verified that for two spin configurations #» σ σ σ l , #» σ σ σ l ′ ∈ (R κ ) N and two integers 1 ≤ k, k ′ ≤ κ, where is the overlap between σ σ σ l (k) and σ σ σ l ′ (k ′ ). We will denote the matrix of all such overlaps by The covariance structure of the vector spin Hamiltonian (12) may therefore be expressed in terms of this matrix-valued overlap as where γ 2 HS = ∑ k,k ′ |γ k,k ′ | 2 denotes the Hilbert-Schmidt norm on the space of κ × κ matrices.
for the unit normalized-ℓ p,2 -sphere, the ℓ p -Gaussian-Grothendieck problem with vector spins may be recast as the task of computing the ground state energy Using Chevet's inequality as in section 3 of [6], it is easy to see that this is the correct scaling of (8) when 2 < p < ∞. To study the constrained optimization problem (21), it is natural to remove the normalized-ℓ p,2 -norm constraint by considering the model with an ℓ p,2 -norm potential. For each t > 0 define the Hamiltonian and introduce the unconstrained Lagrangian Our first noteworthy result in the range 2 < p < ∞, which we now describe, will relate the asymptotic behaviour of the unconstrained Lagrangian (23) to the limit of the ground state energy (21). Consider the space of κ × κ Gram matrices, Γ κ = γ ∈ R κ×κ | γ is symmetric and non-negative definite , endowed with the Loewner order γ 1 ≤ γ 2 if and only if γ 2 − γ 1 ∈ Γ κ , and denote by Γ + κ the subspace of positive definite matrices in Γ κ , For each D ∈ Γ κ write Σ(D) = #» σ σ σ ∈ (R κ ) N | R( #» σ σ σ , #» σ σ σ ) = D (26) for the set of spin configurations #» σ σ σ ∈ (R κ ) N with self-overlap D, and introduce the constrained Lagrangian In section 2, we will show that the constrained Lagrangian (27) admits a deterministic limit L p,D (t) with probability one, and in section 4, we will establish the following asymptotic formula for the unconstrained Lagrangian (23).
Theorem 1.2. If 2 < p < ∞, then almost surely the limit L p (t) = lim N→∞ L N,p (t) exists for every t > 0. Moreover, with probability one, Subsequently, in section 5, we will use the basic properties of convex functions to derive the following key relationship between the limits of (23) and (21). Theorem 1.3. If 2 < p < ∞, then almost surely the limit GSE p = lim N→∞ GSE N,p exists and is given by for every t > 0.
This result reduces the study of the ground state energy (21) to that of the Lagrangian (27) with positive definite self-overlaps D ∈ Γ + κ . The main result of this paper will be a Parisi-type variational formula for the limit L p,D (t) of (27) when D ∈ Γ + κ . Together with (29), (28) and (21), this will give a Parisi-type variational formula for the ℓ p -Gaussian-Grothendieck problem with vector spins when 2 < p < ∞.
Given D ∈ Γ + κ , we now describe the Parisi-type variational formula for the limit L p,D (t) of (27). Let us call a path π : [0, 1] → Γ κ piecewise linear if there exists a partition 0 = q −1 ≤ q 0 ≤ . . . ≤ q r = 1 of [0, 1] and matrices (γ j ) −1≤ j≤r ⊂ Γ κ with when s ∈ [q j−1 , q j ] for some 0 ≤ j ≤ r. Denote by Π the space of piecewise linear and nondecreasing functions on [0, 1] with values in Γ κ , and write Π D for the set of paths in Π that start at 0 and end at D, Notice that any path π ∈ Π D can be identified with two sequences of parameters, satisfying π(q j ) = γ j for 0 ≤ j ≤ r. Explicitly, the path π is given by when s ∈ [q j−1 , q j ] for some 0 ≤ j ≤ r. Denote by N d the set of finite measures on [0, 1] with finitely many atoms, and given t > 0 and λ ∈ R κ(κ+1)/2 consider the function f ∞ λ : Notice that any discrete measure ζ ∈ N d may be identified with two sequences of parameters Moreover, the sequences (33) and (37) can be taken to be the same by duplicating values in (34) and (38) if necessary. We will often abuse notation and write ζ both for the measure and its cumulative distribution function. Given independent Gaussian vectors z j = (z j (k)) k≤κ for 0 ≤ j ≤ r with covariance structure recursively define the sequence (Y λ ,ζ ,π l ) 0≤l≤r as follows. Let This inductive procedure is well-defined by the growth bounds established in lemma A.2. Introduce the Parisi functional, where Sum(γ) = ∑ k,k ′ γ k,k ′ is the sum of all elements in a κ ×κ matrix and ⊙ denotes the Hadamard product on the space of κ × κ matrices. We have made all dependencies on κ, p,t and D implicit for clarity of notation, but we will make them explicit whenever necessary. The following is our main result.
We close this section with a brief outline of the paper. Section 2 will be devoted to the range 1 ≤ p ≤ 2 and will include a proof of theorem 1.1. The rest of the paper will focus on the range 2 < p < ∞. In section 3, we will use the Guerra-Toninelli interpolation [13,29] and the Gaussian concentration inequality [28,29] to show that the constrained Lagrangian (27) admits a deterministic limit. In section 4, we show that, in a certain sense, the limit of the constrained Lagrangian depends continuously on the constraint. This continuity result is inspired by lemma 7.1 in [6]. Unfortunately, lemma 7.1 in [6] does not extend to the vector spin setting since we can no longer modify overlaps by simply re-scaling spin configurations. To overcome this issue, we will revisit lemma 4 in [33], originally designed to prove a vector spin version of the Ghirlanda-Guerra identities [12], and we will leverage Dudley's entropy inequality [9,35]. With this continuity result at hand, we will closely follow section 7 and section 8 in [6] to prove theorem 1.2 and theorem 1.3. This will be the content of section 5 and section 6. In section 7, we will introduce a free energy functional that depends on an inverse temperature parameter β > 0 and is asymptotically equivalent to the constrained Lagrangian (27) after letting β → ∞. For each finite β > 0, a simple modification of the arguments in [33], which we will not detail, gives a Parisi-type variational formula for the limit of the free energy functional. This is reviewed in section 8. The rest of the paper is devoted to finding a similar Parisi-type variational formula after letting β → ∞. This is where our approach differs substantially from that in [6]. In our attempt to generalize the truncation argument in sections 10-12 of [6] to the vector spin setting, we discovered that by a careful analysis of the terminal condition (36) and its positive temperature analogue, the proof for the scalar, κ = 1, case could be considerably simplified. This simplified proof extended with minor modifications to the vector spin setting and is what we present between section 9 and section 11 of this paper. In particular, our arguments can be used to simplify the proof of the main result in [6]. The careful analysis of the terminal conditions is undertaken in section 9. The resulting bounds are combined with the Auffinger-Chen representation [4,16] in section 10 to compare the Parisi functional (42) and its positive temperature counterpart. The specific form of the Auffinger-Chen representation that we use is a higher dimensional generalization of that in [6,7]. The proof of theorem 1.4 is finally completed in section 11. For the reader's convenience, we have postponed a number of technical estimates to appendix A, and we have included a review of elementary results in linear algebra in appendix B.
2 The range 1 ≤ p ≤ 2 In this section we show that the ℓ p -Gaussian-Grothendieck problem with vector spins agrees with its scalar counterpart in the range 1 ≤ p ≤ 2 by proving (9). Recall the definition (1) of the ℓ p -Grothendieck problem GP N,p (A) for an arbitrary N × N matrix A = (a i j ).
Proof. Given σ σ σ ∈ R N in the unit ℓ p -sphere, consider the vector spin configuration #» σ σ σ ∈ (R κ ) N defined by and taking the maximum over all such σ σ σ ∈ R N gives the upper bound in (44). To prove the matching lower bound, fix a vector spin configuration #» σ σ σ ∈ (R κ ) N in the unit ℓ p,2 -sphere. Let g be a standard Gaussian random vector in R κ and for each 1 ≤ i ≤ N consider the random variable To bound this further, denote by · L 2 the L 2 -norm defined by the law of g. Minkowski's integral inequality and the assumption 1 ≤ p ≤ 2 imply that Substituting this into (45) gives the lower bound in (44) and completes the proof.
Applying this result to the random matrix G N = (g i j ) i, j≤N conditionally on the disorder chaos shows that GP N,p = GP N,p for 1 ≤ p ≤ 2. Theorem 1.1 is therefore an immediate consequence of theorem 1.1 and theorem 1.2 in [6]. This concludes our discussion of the ℓ p -Gaussian-Grothendieck problem for 1 ≤ p ≤ 2.

The limit of the constrained Lagrangian
In this section we begin the proof of theorem 1.2 by combining the Gaussian concentration inequality with the Guerra-Toninelli interpolation to show that the random quantity (27) almost surely admits a deterministic limit for every constraint D ∈ Γ κ . As usual [13,29], the proof will come down to proving super-additivity of an appropriate sequence and appealing to the classical Fekete lemma.
Proof. We will be working with systems of different sizes, so let us make the dependence of (26) on N explicit by writing Σ N (D). Given #» σ σ σ ∈ Σ N (D), the covariance structure of the vector spin Hamiltonian (12) established in (19) together with lemma B.3 reveal that It follows by the Gaussian concentration of the maximum that for any s > 0, Since the right-hand side of this expression is summable in N, the Borel-Cantelli lemma implies that lim sup with probability one. It is therefore sufficient to prove that the sequence (E L N,p,D (t)) N admits a limit. We will do this through the Fekete lemma by showing that the sequence (N E L N,p,D (t)) N is super-additive. This is equivalent to proving that for all integers N, M ≥ 1, Given a spin configuration Given an inverse temperature parameter β > 0 and two probability measures µ N and µ M supported on Σ N (D) and Σ M (D) respectively, denote by the interpolating free energy and write · s for the Gibbs average with respect to the interpolating Gibbs measure The Gaussian integration by parts formula (see for instance lemma 1.1 in [29]) yields the convexity of the square of a norm implies that C( #» ρ ρ ρ 1 , #» ρ ρ ρ 2 ) ≤ 0. Combined with the fact that R( #» ρ ρ ρ 1 , #» ρ ρ ρ 1 ) = R( #» σ σ σ 1 , #» σ σ σ 1 ) = R( #» τ τ τ 1 , #» τ τ τ 1 ) = D, this shows that ϕ ′ (s) ≥ 0 and therefore ϕ(0) ≤ ϕ(1). Letting β → ∞ in this inequality and remembering that the L q -norm tends to the L ∞ -norm as q → ∞ yields , this gives (47) and completes the proof.
The heuristic validity of theorem 1.2 should now be clear. From (18), the self-overlap of any vector spin configuration is a Gram matrix in Γ κ . This means that for every integer N ≥ 1, the relationship between the unconstrained Lagrangian (23) and the constrained Lagrangian (27) is Formally bringing the limit into the supremum and using the density of positive definite matrices in the space of non-negative definite matrices yields (28). To turn this heuristic into a rigorous argument, we will use a compactness argument. This will be done in section 5 and will require the continuity properties of the constrained Lagrangian (27) that we explore in the next section.

Continuity of the constrained Lagrangian
In this section we prove that, in a certain sense, the limit of the constrained Lagrangian (27) is continuous with respect to the constraint D ∈ Γ κ by combining lemma 4 in [33] with the classical Dudley inequality as it is stated in equation (A.23) of [35]. Lemma 4 in [33] was originally designed to modify the vector spin coordinates in the mixed-pspin model in order to prove the matrix-overlap Ghirlanda-Guerra identities. Using these identities, it is then possible to access the synchronization mechanism [31,32] and find a tight lower bound for the limit of the free energy through the Aizenman-Sims-Starr scheme [22,33]. We will apply this lemma for a different purpose, and, as it turns out, we will need a more explicit expression for the constant L > 0 appearing in the upper bound. For our purposes, it will be important that this constant is uniformly bounded for all D ∈ Γ κ with uniformly bounded trace. We will therefore repeat the proof of this result and carefully track the dependence of constants.
For each ε > 0 and D ∈ Γ κ denote by B ε (D) the open ε-neighbourhood of D, with respect to the sup-norm γ ∞ = max k,k ′ |γ k,k ′ | on the space of κ × κ matrices, and consider the set of spin configurations with self-overlap in the ε-neighbourhood of D. Denote by λ 1 ≥ · · · ≥ λ κ the real and non-negative eigenvalues of D and let D = QΛQ T (51) be the eigendecomposition of D with diagonal matrix Λ = diag(λ 1 , . . . , λ κ ). Given ε > 0, let 0 ≤ m ≤ κ be such that λ m ≥ √ ε and λ m+1 < √ ε. Introduce the matrix where Λ ε = diag(λ 1 , . . ., λ m , 0, . . ., 0). Given any #» σ σ σ ∈ Σ ε (D), we will construct a κ × κ matrix A #» σ σ σ such that the self-overlap of the configuration is equal to D ε and such that, in a certain sense, A #» σ σ σ has small distortion. Notice that the self-overlap of A #» σ σ σ #» σ σ σ is given by so we will need a matrix with In this context, small distortion will mean that the overlap of #» σ σ σ with other configurations should not change much when #» σ σ σ is replaced by σ σ σ and observe that by the Cauchy-Schwarz inequality where the last inequality follows from the fact that The construction of the matrix A #» σ σ σ is precisely the content of lemma 4 in [33].
for some constant C > 0 that depends only on κ.
Proof. The proof proceeds in two steps: first we reduce the problem to the case when D = Λ and then we use Gershgorin's theorem to conclude. For the reader's convenience, Gershgorin's theorem has been transcribed as theorem B.1 in the appendix.
Step 1: reducing to D = Λ Let us suppose temporarily that the result holds when D is a diagonal matrix. Since Q is an orthogonal matrix and the Hilbert-Schmidt norm is rotationally invariant, We may therefore find a matrix A( The last inequality uses the cyclicity of the trace, the orthogonality of Q and the fact that tr(D) = tr(Λ). This shows that it suffices to prove the result when D = Λ and R ∈ B ε (Λ).
Step 2: proof for D = Λ Introduce the matrices R m = (R k,k ′ ) k,k ′ ≤m and Λ m = diag(λ 1 , . . . , λ m ) consisting of the first m rows and columns of R and Λ respectively. Consider the matrixR Gershgorin's theorem implies that all the eigenvalues ofR m are within m √ ε from 1. The assumption ε < κ −2 implies thatR m is invertible and allows us to define the matrix Using the fact that Since the eigenvalues ofR m are within m √ ε from 1, so are the eigenvalues ofR 1/2 m . Observe that R 1/2 m is symmetric and non-negative definite, so it admits an eigendecompositionR It follows by the orthogonality ofQ m that The cyclicity of the trace, the Cauchy-Schwarz inequality and lemma B.3 now give Finally, define the matrix A by filling all rows and columns of B from m + 1 to κ with zeros. It is clear that ARA T = Λ ε . If we denote by T = (R k,k ′ ) k,k ′ ≥m+1 the matrix consisting of the last κ − m rows and columns of R, then We have used the fact that T ∈ B ε (Λ) in the second inequality. This completes the proof.
This result allows us to map any spin configuration #» σ σ σ ∈ (R κ ) N with self-overlap in the εneighbourhood of D ∈ Γ κ to a modified spin configuration A #» σ σ σ #» σ σ σ that is not too far from #» σ σ σ and has a configuration-independent self-overlap D ε . These two facts will be fundamental to understanding the continuity of the constrained Lagrangian (27). We will now quantify the distance between #» σ σ σ and A #» σ σ σ #» σ σ σ in two different ways: with respect to the normalized-ℓ 2,2 -norm and relative to the canonical metric associated with the Hamiltonian (12), It will be convenient to notice that for any for the ball of radius √ u with respect to the normalized-ℓ 2,2 -norm.
where C > 0 is a constant that depends only on κ.
Proof. By the reverse triangle inequality, To bound this further, notice that by (18) and the Cauchy-Schwarz inequality, Taking square roots yields (60). To prove (61), observe that for any Invoking corollary 4.2 and (60) implies that d( Combining corollary 4.2 and corollary 4.3 with Dudley's entropy inequality, we will now show that, in a certain sense, the constrained Lagrangian (27) is continuous with respect to the constraint D ∈ Γ κ . To state this continuity result precisely, for each ε > 0 and D ∈ Γ κ introduce the relaxed constrained Lagrangian for some constant C > 0 that depends only on κ.
Proof. To simplify notation, let C > 0 denote a constant that depends only on κ whose value might not be the same at each occurrence. By the Gaussian concentration of the maximum and a simple application of the Borel-Cantelli lemma, it suffices to prove that lim sup To simplify notation, let u = tr(D) (62). Invoking corollary 4.3 and corollary 4.2 gives where To bound the first of these terms, for each ε > 0 denote by N (A, d, ε) the ε-covering number of the set A ⊂ (R κ ) N with respect to the metric d on (R κ ) N , and write B N for the Euclidean unit ball in (R κ ) N . Dudley's entropy inequality and corollary 4.3 imply that At this point, recall that the covering number of the Euclidean unit ball Nκ for every ε > 0. A proof of this bound may be found in corollary 4.2.13 of [38]. Combining this with a change of variables reveals that To bound the term (II), notice that for any x, y > 0, The second inequality uses the fact that ℓ 2,2 is continuously embedded in ℓ p,2 for p > 2. Since this bound holds trivially when Substituting (67) and (71) into (66) and letting N → ∞ yields (65). This completes the proof.
In the heuristic proof of theorem 1.2 given at the end of section 3, we used the density of positive definite matrices in the space of non-negative definite matrices to obtain the second equality in (28). When we come to the rigorous proof of this equality, the argument will be more subtle as proposition 4.4 does not quite give continuity. We will instead content ourselves with controlling the limit of the constrained Lagrangian (27) for a non-negative definite matrix D ∈ Γ κ by that for some positive definite matrix in Γ + κ through the following bound.
for some constant C > 0 that depends only on κ.
The results established in this section together with the arguments in section 7 of [6] will allow us to give a rigorous proof of theorem 1.2. The proof will consist of two key steps. First, we will use proposition 4.4 to express a version of the Lagrangian (23) localized to a ball of fixed but arbitrary radius u > 0 as a supremum of constrained Lagrangians (27). Then, we will modify the scaling arguments in section 7 of [6] to show that the unconstrained Lagrangian (23) can be obtained by taking the supremum of these localized Lagrangians over all radii u > 0. The formula obtained by taking these successive suprema will be equivalent to the first equality in (28). As previously mentioned, the second equality will follow immediately from proposition 4.5. The purpose of restricting the supremum to positive definite matrices is technical and will be emphasized when we prove lemma 11.2.

The limit of the unconstrained Lagrangian
In this section we combine proposition 4.4 with the arguments in section 7 of [6] to prove theorem 1.2. As explained at the end of section 4, we will first find a formula for the limit of the localized Lagrangian defined for each u > 0. If Γ κ,u denotes the set of matrices in Γ κ with trace at most u, then (57) implies that for every t > 0, A compactness argument similar to that in lemma 3 of [33] can be used to show that this equality is preserved in the limit.
exists and is given by Moreover, with probability one, L p, Proof. Given ε > 0, observe that the collection of sets B ε (D) for D ∈ Γ κ,u forms an open cover of the compact set Γ κ,u . It is therefore possible to find integer n ≥ 1 and D 1 , . . . , D n ∈ Γ κ,u with With this in mind, given a probability measure µ N supported on B N 2 (u), an inverse temperature parameter β > 0 and a subset S ⊂ B N 2 (u), consider the free energy By monotonicity of the logarithm and the inclusion The Gaussian concentration inequality implies that the free energy F β N (S) deviates from its expectation by more than 1/ √ N with probability at most Le −N/L , where the constant L does not depend on β , N or S. We deduce from this that with probability at least 1 − Le −N/L , Letting β → ∞ and remembering that the L q -norm converges to the L ∞ -norm reveals that with probability at least 1 − Le −N/L , The Borel-Cantelli lemma and proposition 4.4 now give a constant C > 0 that depends only on κ with lim sup Remembering that L N,p,D (t) ≤ L N,p,u (t) for every N ≥ 1 and D ∈ Γ κ,u , it follows that Letting ε → 0 and using the Gaussian concentration of the maximum completes the proof.
This result reduces the proof of theorem 1.2 to establishing the asymptotic version of the equality This will be done using the techniques in section 7 of [6] and relying upon the identity which holds for every t, u > 0 by a change of variables. The absence of such an equality at the level of the constrained Lagrangian (27) is the reason we had to develop the results in section 4. For technical reasons, before we start thinking about proving the asymptotic version of (77), we will have to upgrade the statement of proposition 5.1 to show that L p,u (t) is the limit of the localized Lagrangian (73) with probability one simultaneously over all t, u > 0. Heuristically, this should not be too surprising. As the maximum of a collection of concave functions, the localized Lagrangian (78) is concave in the pair (u,t) conditionally on the disorder chaos (g i j ). Since a concave function is Lipschitz continuous on compact sets, this suggests that (u,t) → L N,p,u (t) should be Lipschitz continuous on compact sets. This continuity would immediately promote almost sure convergence for each t, u > 0 to a convergence with probability one simultaneously over all t, u > 0. To make this argument rigorous, we will use an ℓ 2 -boundedness result of the N × N random matrix Its proof will rely upon Chevet's inequality as it appears in theorem 8.7.1 of [38].  (G N x, x) and E(G N x, x) 2 = 1 whenever x 2 = 1, the Gaussian concentration of the maximum gives a constant C > 0 such that with probability at least If g is a standard Gaussian vector in R N , then Chevet's inequality applied with T and S equal to the Euclidean unit ball in R N gives an absolute constant M > 0 with We have used the fact that the Gaussian width of the unit ball is E sup x 2 =1 (g, x) = E g 2 while its radius is one. Finally, Jensen's inequality reveals that Substituting this into (81) and redefining the constant M > 0 completes the proof.
Proof. Let #» ρ ρ ρ ∈ B N 2 (1) maximize the right-hand side of (78), and define the vector spin configuration The Cauchy-Schwarz inequality shows that Rearranging and using the fact that p > 2 gives It follows by (78), the Cauchy-Schwarz inequality and the mean value theorem that for any u ′ ,t ′ ∈ [K 1 , K 2 ], for some constant M > 0 that depends only on K 1 , K 2 and p. Interchanging the roles of u, u ′ and t,t ′ , it is easy to see that Invoking lemma 5.2 and redefining the constant M completes the proof.
Proof. By lemma 5.3 and a simple application of the Borel-Cantelli lemma, for any 0 < K 1 < K 2 there exists some constant M = M(K 1 , K 2 ) such that almost surely for all u, u ′ ,t,t ′ ∈ [K 1 , K 2 ]. Since L p,u (t) is a deterministic quantity, we also have for all u, u ′ ,t,t ′ ∈ [K 1 , K 2 ]. By countability of rationals and proposition 5.1, we can find a set Ω of probability one where (84) holds simultaneously for all rationals K 1 , K 2 ∈ Q + and at the same time L p,u (t) = lim N→∞ L N,p,u (t) for all u,t ∈ Q + . The triangle inequality implies that for any u,t > 0 and u ′ ,t ′ ∈ Q + , It follows by (85) that on the set Ω, Letting u ′ → u and t ′ → t along rational points completes the proof.
In addition to proposition 5.4, the proof of theorem 1.2 will rely on the fact that the ℓ p,2 -norm potential in the definition of the Hamiltonian (22) forces the maximizers of this random function to concentrate in a large enough neighbourhood of the origin with overwhelming probability.
Lemma 5.5. If 2 < p < ∞, then there exist constants C, M > 0 such that with probability at least for all t > 0.
It follows by Jensen's inequality that . Invoking lemma 5.2 completes the proof.  for every t > 0 and u > 0. Taking the supremum over all u > 0 gives the almost sure existence of L p (t), and invoking proposition 5.1 shows that To establish the second equality in (28), fix a non-negative definite matrix D ∈ Γ κ,u as well as 0 < ε < κ −2 . By proposition 4.5, there exists a constant K > 0 that depends only on κ such that It is readily verified that D + εI ∈ Γ + κ , so in fact Taking the supremum over all D ∈ Γ κ,u , letting ε → 0 and remembering (87) completes the proof. 6 The ground state energy in terms of the Lagrangian In section 5 we proved the first noteworthy result of this paper by expressing the unconstrained Lagrangian (23) as a supremum of constrained Lagrangians (27) in the limit. As we will see in section 7 and section 8, the constrained Lagrangian (27) can be understood using the results in [33]. It is for this reason that we constrained the Lagrangian (23) in the first place. However, the task that we originally set ourselves is understanding the ℓ p -Gaussian-Grothendieck problem with vector spins (8). In this section we connect the unconstrained Lagrangian (23) and the ground state energy (21) by proving theorem 1.3. This will reduce the ℓ p -Gaussian-Grothendieck problem with vector spins to understanding the asymptotic behaviour of the constrained Lagrangian (27). Before we proceed with the proof of theorem 1.3, we give a formal argument that will motivate the results in this section. Given N ∈ N and t > 0, let #» ρ ρ ρ (t) be a point at which the Hamiltonian H N,p,t defined in (23) attains its supremum. Differentiating the expression L N, We have used the fact that ∇ #» σ σ σ H N,p,t ( #» ρ ρ ρ (t)) = 0. This suggests that and therefore To express this ground state energy entirely in terms of the unconstrained Lagrangian (23) as in theorem 1.3, we compute the gradient of the Hamiltonian (23). Since our calculation will be rigorous, we formulate it as a lemma.
Lemma 6.1. If #» σ σ σ ∈ (R κ ) N and t, u > 0, then, conditionally on the disorder chaos (g i j ), Proof. Given 1 ≤ i ≤ N and 1 ≤ k ≤ κ, a simple computation shows that It follows that This finishes the proof.
This simple calculation suggests that which combined with (88) gives Substituting this into (90) gives (29) upon letting N → ∞. The problem with this argument is that the map t → #» ρ ρ ρ (t) might not be differentiable. To overcome this issue, we will prove (93) directly at the points of differentiability of L N,p (t). We will then use a convexity argument to deduce that (93) holds for every t > 0 in the limit. Lemma 6.2. If (g i j ) is a realization of the disorder chaos for which the unconstrained Lagrangian L N,p is differentiable at t > 0, then Proof. Fix ε > 0 and λ > 0. For any configuration with ||| #» σ σ σ ||| Similarly, for any configuration with ||| #» σ σ σ ||| The differentiability of L N,p at t gives λ = λ (ε) > 0 small enough so that This means that an optimizer It follows by lemma 6.1 that Rearranging completes the proof.
Proof. Using theorem 1.2, fix a realization (g i j ) of the disorder chaos for which L N,p (t) converges to L p (t) for all t > 0. Notice that L N,p and L p are convex functions. In particular, they are continuous everywhere on (0, ∞) and differentiable almost everywhere on (0, ∞). If 0 < t 1 < s < t 2 are such that L N,p is differentiable at s, then the convexity of L N,p gives and lemma 6.2 yields By continuity of L N,p and density of the points of differentiability of L N,p in (0, ∞), this inequality implies that for all 0 < t 1 < t < t 2 < ∞, Letting N → ∞ and then letting t 1 ր t and t 2 ց t shows that at any point t ∈ (0, ∞) of differentiability of L p , We will now use this equality to show that L p is differentiable everywhere on (0, ∞). By convexity of L p and theorem 25.1 in [34], it suffices to prove that the sub-differential ∂ L p (t) consists of a single point for every t > 0. Fix t ∈ (0, ∞) as well as a ∈ ∂ L p (t), and let (s k ) and (t k ) be points of differentiability of L p with t k ր t and s k ց t. By definition of the sub-differential, for every integer k ≥ 1. Letting k → ∞ and combining (96) with the continuity of L p yields This completes the proof.
To leverage this result into a proof of theorem 1.3, we must verify the legitimacy of the change of variables used in (90). In other words, we must show that L ′ p (t) does not vanish on (0, ∞). Our proof will rely upon the properties of the eigenvalues and eigenvectors of the Gaussian orthogonal ensemble discussed in chapter 2 of [2]. Recall the definition of the random matrix G N in (79), and notice that the N × N random matrixḠ is distributed according to the Gaussian orthogonal ensemble.
Proof. Given σ σ σ ∈ R N , consider the vector spin configuration #» σ σ σ ∈ (R κ ) N defined by With this in mind, let v denote the ℓ 2 -normalized eigenvector associated with the largest eigenvalue λ N N of the Gaussian orthogonal ensembleḠ N . Given δ > 0, applying (98) to the spin configuration σ σ σ δ = √ Nδ v reveals that By corollary 2.5.4 in [2], the eigenvector v is equal in distribution to g/ g 2 for a standard Gaussian random vector g in R N . Moreover, by the strong law of large numbers, almost surely. Together with the asymptotics of λ N N established in theorem 2.1.22 of [2], this implies that Taking δ > 0 small enough and using the fact that p > 2 shows that L p is strictly positive on (0, ∞). Invoking lemma 6.3 completes the proof.
Proof (Theorem 1.3). Using theorem 1.2, fix a realization (g i j ) of the disorder chaos for which L N,p (t) converges to L p (t) for all t > 0. Let Ω ⊂ (0, ∞) be the collection of points at which L N,p is differentiable for all N ≥ 1. Fix t ∈ Ω, and notice that by convexity of L N,p , for every h ∈ R. By lemma 6.2, the sequence (L ′ N,p (t)) N is uniformly bounded. It therefore admits a subsequential limit a. Letting N → ∞ in (99) shows that a belongs to the sub-differential ∂ L p (t).
Invoking lemma 6.3 shows that a = L ′ p (t), and therefore L ′ N,p (t) → L ′ p (t). It follows by lemma 6.4 that L N,p (t) < 0 for large enough N, so Since Ω is dense in (0, ∞) and L ′ N,p is continuous, this equality extends to all t > 0. Letting N → ∞ and using lemma 6.3 completes the proof. 7 Replacing the constrained Lagrangian by a free energy So far, we have reduced the ℓ p -Gaussian-Grothendieck problem with vector spins to understanding the asymptotic behaviour of the constrained Lagrangian (27) with positive definite constraints. This task will occupy the remainder of the paper. The starting point of our analysis will be the Parisitype variational formula for free energy functionals established in [33]. To access this result, we must first replace the constrained Lagrangian by a free energy functional. In this section, given a constraint D ∈ Γ κ which is fixed throughout, we introduce a free energy functional that depends on an inverse temperature parameter β > 0 and is asymptotically equivalent to the constrained Lagrangian (27) upon letting β → ∞.
For each inverse temperature parameter β > 0 and every ε > 0, consider the free energỹ and the quenched free energy Recall the definition of the relaxed constrained Lagrangian in (63). Since the L q -norm converges to the L ∞ -norm, it is clear that We will now use the continuity result in proposition 4.4 to show that the right-hand side of this equation coincides with the limit of the constrained Lagrangian (27). Subsequently, we will prove that (102) still holds if the limit in β is taken after the limits in ε and N. The benefit of exchanging these limits is that the main result in [33] gives a Parisi-type variational formula for the limit in ε and N of the quenched free energy (101) for each fixed β > 0. In section 9 and section 10 we will study this formula in the limit β → ∞ to finally prove theorem 1.4 in section 11.
On the other hand, the Gaussian concentration of the maximum reveals that for every ε > 0,  Proof. Fix δ ∈ (0,t), and for each t > 0 let #» ρ ρ ρ t be a maximizer of the relaxed constrained Lagrangian (63). By Fubini-Tonelli and a change of variables, To bound this further, let A ∈ R κ×κ be a symmetric and non-negative definite matrix with AA T = κD, and denote by #» σ i ∈ R κ the i'th column of A. Consider the subsequence M = Nκ, and define the κ-periodic vector spin configuration If G M denotes the M × M random matrix in (79), then the Cauchy-Schwarz inequality implies that for each t > 0,

Rearranging and remembering (57) gives
where λ M M denotes the largest eigenvalue of the Gaussian orthogonal ensembleḠ M in (97). Substituting this into (107), appealing to the Gaussian concentration of the free energy and leveraging the asymptotics of λ M M established in theorem 2.1.22 of [2] shows that We have implicitly used the fact that the limit of F N,ε (β ) exists and therefore coincides with that of F M,ε (β ). This can be shown using a Guerra-Toninelli argument as in theorem 3.1, or by appealing to the results in [33] as we will do in section 8. Letting ε → 0, then β → ∞ and finally δ → 0 completes the proof upon invoking proposition 7.1.
Proof. By lemma 7.2, it suffices to prove the upper bound in (109). Fix ε ∈ (0, 1), and let δ = ε/K for a large enough K > 0 to be determined. Consider the subsequence M = Nκ as in the proof of lemma 7.2, and let #» ρ ρ ρ ∈ Σ δ (D) be a maximizer of the relaxed constrained Lagrangian L M,p,D,δ (t) in (63). Introduce the δ / √ κ-neighbourhood, . Indeed, the same argument used to obtain (54) implies that for any κ) is large enough. The second inequality uses (57). This means that To bound this further, fix #» σ σ σ ∈ C δ / √ κ ( #» ρ ρ ρ ) and recall the definition of the M × M random matrix G M in (79). The Cauchy-Schwarz inequality implies that On the other hand, an identical argument as that used to obtain (69) shows that Together with (111), (108) and lemma 5.2, this gives constants C, K ′ > 0 that depend only on κ, D, p and t such that with probability at least 1 −Ce −M/C , Substituting this lower bound into (110) and combining the Gaussian concentration of the free energy with the Borel-Cantelli lemma to let N → ∞ yields Taking ε = β −1 and letting β → ∞ completes the proof upon invoking proposition 7.1.

The limit of the free energy
In this section we describe the implications of the main result in [33] on the asymptotic representation of the constrained Lagrangian (27) established in theorem 7.3. Given a constraint D ∈ Γ κ , some t > 0 and an inverse temperature parameter β > 0, all of which will remain fixed throughout this section, consider the measure on R κ defined by Notice that the quenched free energy (101) may be written as If it were not for the fact that the measure µ in (112) is not compactly supported, this free energy functional would fall into the class of free energy functionals studied in [33]. Fortunately, the compact support assumption in [33] is not necessary. Instead, it is a convenient assumption that ensures all objects introduced are well-defined and spin configurations in the set Σ ε (D) remain bounded. Replicating the arguments in [33], it is not hard to use that the measure (112) exhibits super-Gaussian decay in the range 2 < p < ∞ to show that the analogue of the Parisi formula with vector spins proved in [33] for compactly supported measures also holds for the free energy functional (113). We will not give any more details than this, and simply content ourselves with stating the asymptotic formula for (113) which we will use between section 9 and section 11 to prove theorem 1.4. Denote by M d the set of probability distributions on [0, 1] with finitely many atoms. Notice that any discrete measure α ∈ M d may be identified with two sequences of parameters Given a path π ∈ Π D defined by the sequences (33) and (34), for each 0 ≤ j ≤ r consider an independent Gaussian vector z j = (z j (k)) k≤κ with covariance structure (39). Define the sequence (X λ ,α,π,β l ) 0≤l≤r recursively as follows. Let and for 0 ≤ l ≤ r − 1 let This inductive procedure is well-defined by the growth bounds in lemma A.1. Introduce the Parisi functional, Observe that where we have abused notation by writing α both for the measure and its cumulative distribution function. The Parisi functional may therefore be expressed as We have made all dependencies on D, p and t implicit for clarity of notation, but we will make them explicit whenever necessary. The proof of theorem 1.4 will leverage the following consequence of the main result in [33].
where the infimum is taken over all (λ , α, π) ∈ R κ(κ+1) This result can be viewed as a positive temperature analogue of theorem 1.4. Together with theorem 7.3, it essentially reduces the proof of theorem 1.4 to showing that Notice the similarity between the Parisi functionals (121) and (42). If it were not for the terms X 0 and Y 0 in (118) and (41), there would be a natural correspondence between these two functionals obtained by setting ζ = β α. The proof of (123) will therefore consist in showing that, when evaluated at almost minimizers, the terms X 0 and Y 0 in (118) and (41) differ by a quantity that vanishes as β → ∞. To control this difference, we will compare the terminal conditions (116) and (36) in section 9. We will then use the Auffinger-Chen representation [4,16] in section 10 to translate the bounds on the terminal conditions into control on X 0 and Y 0 . This analysis will be exploited in section 11 to establish (123) and therefore prove theorem 1.4. This strategy is considerably different to that in [6], where the free energy functional (113) is truncated at some level M > 0. This truncation simplifies much of the analysis for fixed M > 0, but requires a lot of care when sending M → ∞. By not truncating the free energy, we simplify and shorten the proof of theorem 1.4 even in the scalar case, κ = 1, studied in [6].

Comparison of the terminal conditions
In this section we prove quantitative bounds on the difference between the terminal conditions (116) and (36). Although the analysis in this section uses only elementary concepts, it is the key to proving theorem 1.4; the rest of the paper will use tools from the literature to transform the bounds established in this section into a proof of theorem 1.4. To alleviate notation, the inverse temperature parameter β > 0, the Lagrange multiplier λ ∈ R κ(κ+1)/2 and the parameters t > 0 and 2 < p < ∞ will be fixed throughout this section. We will write C > 0 for a constant that depends only on κ, p and t whose value might not be the same at each occurrence.
We begin by bounding f β λ from above by f ∞ λ up to a small error. It will be necessary to make the dependence of these terminal conditions on t > 0 explicit by writing f β λ ,t and f ∞ λ ,t .
This result will play its part when we prove the upper bound in (123), at which point we will have to replace f ∞ λ ,t−δ in (124) by f ∞ λ ,t . This will be achieved through a continuity result that is an immediate consequence of the following bound on any maximizer #» σ * #» x ,λ ,t of (36).
Lemma 9.2. If 2 < p < ∞ and #» x ∈ R κ , then there exists #» σ * #» x ,λ ,t ∈ R κ which attains the maximum in (36). Moreover, Lemma 9.4. If 2 < p < ∞, #» x ∈ R κ and δ > 0, then To simplify notation, for k > k ′ let λ k,k ′ = λ k ′ ,k . Recall the function g : R κ → R in (126). Given δ > 0 and #» ρ ∈ R κ , consider the function f : where the constant h( #» ρ ) depends on #» ρ and is given by Similarly, for 1 ≤ k = k ′ ≤ κ, It follows that Invoking corollary B.2 shows that the Hessian of f is non-negative definite, and therefore f is convex. With this in mind, let X = (X i ) i≤κ be a vector of independent random variables with X i uniformly distributed on the interval [ρ(i) − δ , ρ(i) + δ ]. Jensen's inequality implies that Substituting the definition of f into the right-hand side of this inequality and integrating yields Rearranging and taking #» ρ = #» σ * #» x ,λ ,t completes the proof.
This result will play its part when we prove the lower bound in (123), at which point we will have to carefully deal with the fact that it only gives a bound of f ∞ λ by f β λ for values of #» x in a (possibly large) neighbourhood of the origin. Fortunately, this will not be a problem. It turns out that the bound (131) will be applied to one of the Auffinger-Chen control processes introduced in the next section. The generalization of the moment bound in lemma 12.3 of [6] to the vector spin setting, which corresponds to lemma 10.4 in this paper, will be used to show that dominating f ∞ λ by f β λ around the origin is sufficient for our purposes.

The Auffinger-Chen representation
In this section we extend the Auffinger-Chen stochastic control representation established for κ = 2 and Lipschitz terminal conditions in [7] to the setting of arbitrary integer κ ≥ 1 and terminal conditions with sub-quadratic growth such as (116) and (36). The results in this section will be combined with the bounds obtained in section 9 to compare the quantities X 0 and Y 0 in (118) and (41). This will lead to a proof of theorem 1.4 in section 11.
Throughout this section, a constraint D ∈ Γ κ , an inverse temperature parameter β > 0, a Lagrange multiplier λ ∈ R κ(κ+1)/2 , a κ-dimensional Brownian motion W W W = (W 1 , . . . ,W κ ) and parameters t > 0 and 2 < p < ∞ will be fixed. We will also give ourselves a piecewise linear path π ∈ Π D defined by the sequences (33) and (34), as well as a discrete probability distribution α ∈ M d defined by the sequences (33) and (115). To prove the Auffinger-Chen representation, it will be convenient to replace the Gaussian random vectors z j with covariance structure (39) appearing in the definition of the Parisi functional (121) by a continuous time stochastic process B B B = (B B B(s)) s≥0 that plays the same role, Since π ′ (r) = (q j − q j−1 ) −1 (γ j − γ j−1 ) ∈ Γ κ for r ∈ (q j−1 , q j ), this process is well-defined. Moreover, the Ito isometry shows that If we introduce the function Φ : then the independence of the increments of B B B and (134) imply that the Parisi functional (121) may be written as We have made all dependencies of Φ on D, β , λ , α, π, p and t implicit for clarity of notation, but we will make them explicit whenever necessary. To obtain the Auffinger-Chen representation, we first use Gaussian integration by parts (see for instance lemma 1.1 in [29]) to show that Φ satisfies a non-linear parabolic PDE.
where ∂ s Φ is understood as the right-derivative at the points of discontinuity of α.
To compute the time derivative of Φ, let g be a standard Gaussian vector in R κ and consider the function

s)g), the Gaussian integration by parts formula gives
Remembering that α(s) = α j completes the proof.
The Hamilton-Jacobi equation (137) is the vector spin analogue of the Parisi PDE [29]. We now use similar ideas to those in [4,16,29] to obtain the vector spin analogue of the Auffinger-Chen representation from (137). To overcome the lack of Lipschitz continuity in the terminal condition (116), we will rely upon three classical results in stochastic analysis: the Ito formula, the Girsanov theorem and the Novikov condition [11,17]. Given a filtration F = (F s ) 0≤s≤1 , it will be convenient to denote by A the class of admissible control processes, and Proposition 10.2. If 2 < p < ∞, then there exists a probability space (Ω, F 1 , P), a filtration F = (F s ) 0≤s≤1 , a Brownian motion W W W = (W W W s ) 0≤s≤1 and a continuous adapted process X X X = (X X X s ) 0≤s≤1 which together form a weak solution to the stochastic differential equation Moreover, with the supremum attained by the admissible process v v v(s) = ∇Φ(s, X X X(s)).
Proof. To alleviate notation, let C > 0 denote a constant that depends only on κ, λ , α, π, β , D, p and t whose value might not be the same at each occurrence. An induction based on lemma A.1 and lemma A.3 can be used to show that for any (s, #» x ) ∈ (0, 1] × R κ , (141) With this in mind, consider the process L L L = (L L L(s)) 0≤s≤1 , The growth bound (141) and the assumption 1 where the last inequality uses the fact that π ′ is piecewise constant. Combining this with Doob's maximal inequality reveals that It follows by the Novikov condition that the stochastic exponential E (L L L) is a martingale.
where Φ and its derivatives are evaluated at (s,Y Y Y (s)). The growth bound (141), the Cauchy-Schwarz inequality, the boundedness of π ′ and the Ito isometry reveal that This means that ( √ 2 s 0 (∇Φ, π(s) dW W W (s))) s≤1 is a martingale. Together with the non-negative definiteness of π ′ , this implies that with equality for the process v v v(s) = ∇Φ(s, X X X(s)). Rearranging gives the lower bound in (140). To prove the matching upper bound, it suffices to show that v v v(s) = ∇Φ(s, X X X(s)) belongs to the admissible class A . Fix 0 < s ≤ r ≤ 1. By the triangle inequality, the Cauchy-Schwarz inequality and the growth bound (141), On the other hand, Doob's maximal inequality and the Ito isometry yield Substituting this into (142) and applying Gronwall's inequality to the resulting bound shows that E sup 0≤s≤1 X X X(s) 2 2 ≤ C. Invoking (141) one last time completes the proof.
Of course, an analogous result holds for the random variable Y 0 in (41). Given a discrete measure ζ ∈ N d defined by the sequences (33) and (38), introduce the function Ψ : The Parisi functional (42) may be written as where ∂ s Ψ is understood as the right derivative at the points of discontinuity of ζ . An identical argument to that in proposition 10.2 gives the following weak form of the Auffinger-Chen representation.
Proposition 10.3. If 2 < p < ∞, then there exists a probability space (Ω, F 1 , P), a filtration F = (F s ) 0≤s≤1 , a Brownian motion W W W = (W W W s ) 0≤s≤1 and a continuous adapted process X X X = (X X X s ) 0≤s≤1 which together form a weak solution to the stochastic differential equation Moreover, with the supremum attained by the admissible process v v v(s) = ∇Ψ(s, X X X(s)).
We close this section with a moment bound on the weak solution to the stochastic differential equation (147) which will allow us to deal with the fact that proposition 9.5 only holds for bounded values of #» x .
Lemma 10.4. If (X X X (s)) 0≤s≤1 is a weak solution to (147) and η = max(1 for some constant C > 0 that depends only on κ, p,t and D. Proof. To alleviate notation, let C > 0 denote a constant that depends only on κ, p,t and D whose value might not be the same at each occurrence. If E X X X(1) 2 2 < 1 the result is trivial, so assume without loss of generality that E X X X(1) 2 2 ≥ 1. Introduce the process v v v(s) = ∇Ψ(s, X X X(s)) in such a way that

The triangle inequality and (143) reveal that
With this in mind, fix 1 ≤ l ≤ κ. The Cauchy-Schwarz inequality implies that Substituting this back into (150) yields On the other hand, taking the zero process in proposition 10.3 shows that and therefore E X X X (1) 2 To bound this further, observe that by (A.6) in appendix A and Jensen's inequality, where we have used the assumption that E X X X(1) 2 2 ≥ 1. Substituting this back into (151) and again using the fact that E X X X (1) 2 2 ≥ 1 gives Rearranging completes the proof.

Proof of the main result
In this section we finally prove theorem 1.4. The proof of the upper bound will follow section 12.2 of [6]. On the other hand, the proof of the lower bound will be considerably shorter and less involved than its one-dimensional analogue in [6]. In particular, it will leverage the results of section 9 to avoid the technicalities associated with truncating. Specializing our arguments to the scalar, κ = 1, case gives a shorter and more direct proof of the main result in [6] when 2 < p < ∞.
It will be convenient to control the magnitude of λ β . It is at this point that we have to specialize the claim of theorem 1.4 to positive definite matrices D ∈ Γ + κ . The author's inability to extend the following result to the space of non-negative definite matrices is the reason for proving the second equality in (28) and extending section 4 beyond proposition 4.4.

B Background material
In this appendix we collect a number of elementary results from linear algebra.
Theorem B.1 (Gershgorin). If A ∈ R n×n and R i = ∑ j =i |a i j | is the sum of the absolute values of the non-diagonal entries in the i'th row of A, then the eigenvalues of A are all contained in the union of the Gershgorin discs, Proof. See theorem 6.1.1 in [15]. Proof. Let λ n ≥ λ n−1 ≥ · · · ≥ λ 1 ≥ 0 be the real and non-negative eigenvalues of the matrix A.
Since the trace of a matrix is the sum of its eigenvalues, We have used the fact that the eigenvalues of A 2 are λ 2 n ≥ λ 2 n−1 ≥ · · · ≥ λ 2 1 ≥ 0 in the third equality and the non-negativity of the eigenvalues of A in the inequality. Invoking the assumption that B ≤ C yields tr(AB) ≤ ∑ n i=1 m i Cm T i = tr(AC). To complete the proof observe that B 2 HS = tr(B T B) ≤ tr(B T C) = tr(C T B) ≤ tr(C T C) = C 2 HS , where we have used the fact that the trace of a matrix coincides with the trace of its transpose.
Lemma B.5. If A ∈ R n×n is a symmetric and positive definite matrix and P ∈ R n×n is a symmetric matrix, then there exists ε * = ε * (A, P ∞ , n) > 0 such that is symmetric and positive definite for every ε < ε * .
Proof. Denote by λ 1 the smallest eigenvalue of A. For any x ∈ R n and every ε > 0, Since λ 1 > 0 by positive definiteness of A, setting ε * = λ 1 n P ∞ completes the proof.