Maximal correlation and monotonicity of free entropy and Stein discrepancy

We introduce the maximal correlation coefficient $R(M_1,M_2)$ between two noncommutative probability subspaces $M_1$ and $M_2$ and show that the maximal correlation coefficient between the sub-algebras generated by $s_n:=x_1+\ldots +x_n$ and $s_m:=x_1+\ldots +x_m$ equals $\sqrt{m/n}$ for $m\le n$, where $(x_i)_{i\in \mathbb{N}}$ is a sequence of free and identically distributed noncommutative random variables. This is the free-probability analogue of a result by Dembo--Kagan--Shepp in classical probability. As an application, we use this estimate to provide another simple proof of the monotonicity of the free entropy and free Fisher information in the free central limit theorem. Moreover, we prove that the free Stein Discrepancy introduced by Fathi and Nelson is non-increasing along the free central limit theorem.

In 2001, Dembo, Kagan, and Shepp [8] investigated this maximal correlation for the partial sums S n := X 1 + · · · + X n of i.i.d. random variables X i . Namely, regardless of the X i 's distribution, they showed that R(S n , S m ) ≤ m n , m ≤ n, with equality if σ(X 1 ) < ∞.
The aim of this note is to prove the analogous statement in the context of free probability, a theory which was initiated by Voiculescu (see [18]) in the 1980's and has since flourished into an established area with connections to several other fields. We refer to [11,13] for an introduction on the subject and an extensive list of references. Before stating our main result, let us first recall the necessary framework.
Note that we have |ρ(x, y)| ≤ 1 by the Cauchy-Schwarz inequality. Given M 1 , M 2 ⊆ M two subspaces, we call the maximal correlation coefficient between M 1 and M 2 . For M 1 = L 2 (x) and M 2 = L 2 (y) we simply write R(x, y), and we call it the maximum correlation between the noncommutative random variables x and y.
We are now ready to state our main result.
Theorem 1.1. Let (x i ) i∈N be a sequence of free, non-zero, identically distributed, noncommutative random variables, and let s n := x 1 + · · · + x n . Then for any m ≤ n, we have R(s n , s m ) = m n .
As in the classical setting, the interesting feature of the above statement is its universality as it holds regardless of the distribution of the noncommutative random variables. A possible way to prove the above statement consists of using the microstate approach by approximating the law of each noncommutative random variable by that of random matrices. One then exploits the multidimensional version of the classical maximal correlation inequality to apply it for the corresponding random matrices (seen as vectors) before passing to the limit and deducing the above theorem. The drawback of this approach is that it won't allow the extension of Theorem 1.1 to the multidimensional case. Indeed, by the refutation of Connes embedding problem [12], there are noncommutative random variables whose joint moments cannot be approximated well by moments of matrices. This makes it impossible to use the mentioned approach to prove a multidimensional version of the above theorem. On the contrast, our proof of Theorem 1.1, which is carried in Section 2, adapts the approach of [8] to a noncommutative setting and is readily extendable to the multidimensional setting.
A celebrated result of Artstein et al [1] provided a solution to Shannon's problem regarding the monotonicity of entropy in the classical central limit theorem. In the context of free probability, the concept of free entropy and information was developped by Voiculescu in a series of papers (see for example [21]). Two approaches were given for the definition of free entropy, referred to as microstates and non-microstates and denoted by χ and χ * respectively (see [13,Chapters 7 and 8]). These two coincide in the one-dimensional setting, in which case the free entropy of a compactly supported probability measure µ is given by It is not known whether χ and χ * coincide in the multidimensional setting. Our proof of the result presented in the sequel extends to the multi-dimensional case for χ * only. Given a noncommutative probability space (M, τ ) and a self-adjoint element z ∈ M , we define χ * (z) as χ * (µ z ) where µ z denotes the distribution of z, i.e., the probability measure characterized by p dµ z = τ p(z) for all polynomials p ∈ C[X].
In [16], Shlyakhtenko proved the monotonicity of the free entropy in the free central limit theorem providing an analogue of the result of [1] in the noncommutative setting. As an application of our maximal correlation estimate, we recover the monotonicity property which we state in the next corollary.
for every integers m ≤ n.
As in the classical setting, the monotonicity of the entropy follows from that of the Fisher information. In Section 3, we prove the latter as a consequence of Theorem 1.1. The idea of using the maximal correlation inequality in this context goes back to Courtade [6] who used the result of Dembo, Kagan, and Shepp [8] to provide an alternative proof of the monotonicity of entropy in the classical setting.
Formulated alternatively, the above corollary states that given a compactly supported probability measure µ and positive integers m ≤ n, one has where ⊞ denotes the free convolution operation (so µ ⊞n is the distribution of the sum of n free copies of an element x with distribution µ), and α * is the pushforward operation by the dilation t → αt. As a matter of fact, it is possible to make sense of µ ⊞t for all real t ≥ 1 (see [14]). Very recently, Shlyakhtenko and Tao [17] extended (2) to real-valued exponents while providing two different proofs. It would be interesting to see if the argument in this paper based on maximal correlation could be extended to cover non-integer exponents.
Another consequence of Theorem 1.1 concerns the monotonicity of the free Stein discrepancy along the free central limit theorem. Stein discrepancy measures in some sense how far is a probability measure from another one characterized by some integration by parts formula. Using the classical maximal correlation inequality, it was proven by Courtade, Fathi and Pananjady [7] that the Stein discrepancy (relative to the standard Gaussian measure) is non-increasing in the central limit theorem. The notion of free Stein discrepancy relative to a semicircular law was introduced by Fathi and Nelson [10]. Recall that a standard semicircular variable S is a self-adjoint element of (M, τ ) who distribution has density 1 Analogously to the normal distribution, a standard semicircular variable S ∈ M is characterized by the following integration by parts formula stating that for every polynomial P . Here, ∂ denotes the noncommutative derivative and the right hand side dot product refers to the dot product in the Hilbert space L 2 (M ) ⊗ L 2 (M ) (see [13]). Following [10], a free Stein kernel of x ∈ M is an element K ∈ L 2 (M ) ⊗ L 2 (M ) such that for every polynomial P . It was shown by Cébron, Fathi and Mai [4] that free Stein kernel always exist if τ (x) = 0. The free Stein discrepancy of x relative to S is then defined as where the infimum is taken over all free Stein kernels K of x. We should note that [10] introduced the notion of free Stein kernel/discrepancy relative to a general potential while we will only be dealing with the particular case of the potential t 2 /2 leading to the notions stated above.
As an application of our maximal correlation inequality, we obtain the following corollary extending the aforementioned monotonicity of Stein discrepancy obtained in [7] to the free setting. Corollary 1.3. Given (x i ) i∈N a sequence of free, centered, identically distributed, self-adjoint random variables in (M, τ ) with unit norm, one has for every integers m ≤ n.
Note that taking m = 1 in the above corollary, we obtain that Σ * sn √ n | S decays faster than C/ √ n for some constant C recovering a result of [4]. Similarly to the previous results of this paper, the proof of the above corollary works verbatim in the multidimensional setting.
Aknowledgement: The authors would like to thank the anonymous referees for their numerous generous suggestions which greatly improved the manuscript. For instance, the monotonicity of free Stein discrepancy and its proof was suggested by one of the referees. The authors are grateful to Roland Speicher for helpful comments and for bringing to their attention the recent preprint [17]. The second named author is thankful to Marwa Banna for helpful discussions. the orthogonal projection of z ∈ M onto L 2 I , which is nothing else but the conditional expectation of z given L 2 I : y = proj I (z) ⇐⇒ y ∈ L 2 I and ∀x ∈ L 2 I , τ (xy) = τ (xz). In particular proj I (z) = z if z ∈ L 2 I and by the trace property proj I (xzy) = x proj I (z)y for all x, y ∈ L 2 I . Note that proj ∅ (z) = τ (z) · 1 and proj J • proj I = proj J for every J ⊆ I (tower property). When freeness is further involved, we can say a bit more. The following lemma appears in different forms in the literature ( [3], [13, Section 2.5]), we include its proof for completeness. (i) if z is a (noncommutative) polynomial in variables in C x j : j ∈ J , then proj I (z) is a polynomial in only those variables that are actually in C x k : k ∈ I ∩ J ; (ii) the projections commute: proj I • proj J = proj I∩J . Proof.
(i) By linearity of proj I , we may suppose without loss of generality that z = a 1 · · · a r with a j ∈ C x i j and consecutively distinct indices i 1 , . . . , i r ∈ J.
3 · x 4 (r = 10), and π = 1 2 3 4 5 6 7 8 9 10 , then κ I π (a 1 , . . . , a r ) = κ I 2 We now invoke [15,Theorem 3.6] with F := τ | L 2 I and the sub-algebra N := C x j : j ∈ J \ I free from B := L 2 I over D := C · 1 to see that all conditional cumulants involving any variable in N reduce to constants; e.g., in the above display, where κ 3 is the (unconditionned) cumulant M 3 → C. This shows that proj I (z) is in fact a polynomial only in the subset of the variables a 1 , . . . , a r that belong to C x k : k ∈ I ∩J (in the example, κ I π (a 1 , . . . , a r ) = τ κ I I∩J , so we only need to check that proj I proj J (z) ∈ L 2 I∩J . By a closure argument, we may suppose that proj J (z) belongs to C x j : j ∈ J . In this case proj I proj J (z) ∈ C x k : k ∈ I ∩J ⊆ L 2 I∩J by the previous point.
We now set, for every z ∈ M , where | · | is the cardinal notation. The following decomposition will play a crucial role in the proof of Theorem 1.1.

Lemma 2.2 (Efron-Stein decomposition).
For every finite set I ⊂ N, Proof. We repeat in a compact way the argument of Efron and Stein [9]: The elements z J will be orthogonal thanks to this direct consequence of Lemma 2.1: Suppose that x 1 , x 2 , . . . are free, and let I, J ⊂ N be finite sets such that I \ J = ∅. Then proj J (z I ) = 0 for every z ∈ M . In particular, z I is orthogonal to z J ∈ L 2 J .
Proof. Apply Lemma 2.1 and gather the subsets K ⊆ I that have same intersection L := J ∩ K with J: To prove Theorem 1.1 we shall finally exploit the fact that the partial sum s n := x 1 + · · · + x n is symmetric in (x 1 , . . . , x n ). Our next proposition is tailored for this purpose.  (2) For every m ≤ n and every polynomial p, proj L 2 (x 1 ,...,xm) p(s n ) = proj L 2 (sm) p(s n ) .
Proof of Theorem 1.1. The lower bound R(s n , s m ) ≥ m/n is straightforward since, by freeness, σ(s n ) 2 = n σ(x 1 ) 2 and cov(s n , s m ) = σ(s m ) 2 = m σ(x 1 ) 2 . For the upper bound, we must show that ρ(z, z ′ ) ≤ m/n for all z ∈ L 2 (s n ) and z ′ ∈ L 2 (s m ). W.l.o.g., we may suppose that τ (z) = τ (z ′ ) = 0 and, by another closure argument, that z is a polynomial in s n (and thus a symmetric polynomial in x 1 , . . . , x n ). Then by the Cauchy-Schwarz inequality and Proposition 2.4, and the proof is complete.

Monotonicity of the free entropy and free Fisher information
The goal of this section is to prove Corollary 1.2. Let us start by noting that the free entropy and free Fisher information of a self-adjoint element z ∈ M are related through the integral formula (see [13,Chapter 8]) where x is a standard semi-circular variable free from z, and Φ denotes the free Fisher information. After [19] (see also [13,Chapter 8]), the free Fisher information of a noncommutative, self-adjoint random variable z ∈ M is defined as Φ(z) := ξ 2 2 where the so called conjugate variable ξ := ξ(z) is any element of L 2 (z) such that, for every integer r ≥ 0, (If such a ξ does not exist, we set Φ(z) := ∞.) We note from (6) that τ (ξ(z)) = 0 and the homogeneity property Φ(αz) = α −2 Φ(z), α > 0.
In the next Corollary, we show how the monotonicity of the free Fisher information follows easily from Theorem 1.1.
Corollary 3.1. Let (x i ) i∈N be a sequence of free, identically distributed, self-adjoint random variables in (M, τ ) and denote s k := x 1 + . . . + x k for every positive integer k. Then for all positive integers m ≤ n, we have Proof. Assume the existence of ξ(s 1 ), as otherwise Φ(s 1 ) = ∞ and there is nothing to prove. According to [13, p. 206], the free sum s n = s m + (s n − s m ) admits ξ(s n ) = proj L 2 (sn) (ξ(s m )) as conjugate variable. Therefore, by Theorem 1.1, i.e., Φ(s n ) ≤ m n Φ(s m ). We conclude by the homogeneity property.
In view of (5) and the divisibility of the semicircular distribution w.r.t. the free convolution, the above corollary readily implies (1), thus proving Corollary 1.2.

Monotonicity of the free Stein discrepancy
The goal of this section is to provide a proof of Corollary 1.3. Let us fix m ≤ n and x 1 , x 2 , . . . a sequence of free, centered, identically distributed, self-adjoint random variables in (M, τ ) with unit norm. Let us record the following consequence of Theorem 1.1 which will be used in the sequel.
Given K ∈ L 2 (s m ) ⊗ L 2 (s m ) a free Stein kernel of sm for every polynomial P . Since s m and s n − s m are free, then using the intertwining relation between ∂ and the conditional expectation [20], we can write s m √ m , P (s n ) = √ m K, ∂P (s n ) .
The above relation appears in [5] (before Lemma 2.5 there). Using linearity and that the x i 's are identically distributed, we get