Nonlinear Matrix Concentration via Semigroup Methods

Matrix concentration inequalities provide information about the probability that a random matrix is close to its expectation with respect to the $l_2$ operator norm. This paper uses semigroup methods to derive sharp nonlinear matrix inequalities. In particular, it is shown that the classic Bakry-\'Emery curvature criterion implies subgaussian concentration for ``matrix Lipschitz'' functions. This argument circumvents the need to develop a matrix version of the log-Sobolev inequality, a technical obstacle that has blocked previous attempts to derive matrix concentration inequalities in this setting. The approach unifies and extends much of the previous work on matrix concentration. When applied to a product measure, the theory reproduces the matrix Efron-Stein inequalities due to Paulin et al. It also handles matrix-valued functions on a Riemannian manifold with uniformly positive Ricci curvature.

This property further implies local ergodicity of the matrix semigroup, which we can use to prove strong bounds on the trace moments of nonlinear random matrix models.
The power of this approach is that the Bakry-Émery condition has already been verified for a large number of semigroups. We can exploit these results to identify many new settings where matrix concentration is in force. This program entirely evades the question about the proper way to extend log-Sobolev inequalities to matrices.
Our approach reproduces many existing results from the theory of matrix concentration, such as the matrix Efron-Stein inequalities [PMT ]. Among other new results, we can achieve subgaussian concentration for a matrix-valued "Lipschitz" function on a positively curved Riemannian manifold.
Here is a simplified formulation of this fact.
The real-linear space H d contains all d × d Hermitian matrices, and · is the ℓ 2 operator norm. The operators ∂ i compute partial derivatives in local (normal) coordinates.
Theorem . follows from abstract concentration inequalities (Theorem . and Theorem . ) and the classic fact that the Brownian motion on a positively curved Riemannian manifold satisfies the Bakry-Émery criterion [BGL , Sec. . ]. See Section . for details. Particular settings where the theorem is valid include the unit Euclidean sphere and the special orthogonal group. The variance proxy f is analogous with the squared Lipschitz constant that appears in scalar concentration results. We emphasize that ∂ i f is an Hermitian matrix, and the variance proxy involves a sum of the matrix squares. Thus, the "Lipschitz constant" is tailored to the matrix setting.
As a concrete example, consider the n-dimensional sphere S n ⊂ R n+1 , with uniform measure σ n and curvature ρ = n − 1. Let A 1 , . . . , A n+1 ∈ H d be fixed matrices. Construct the random matrix By symmetry, E σ n f = 0. Moreover, the variance proxy f ≤ To start, we develop some basic facts about an important class of Markov semigroups that acts on matrix-valued functions. Given a Markov process, we define the associated matrix Markov semigroup and its infinitesimal generator. Then we construct the matrix carré du champ operator and the Dirichlet form. Afterward, we outline the connection between convergence properties of the semigroup and Poincaré inequalities. Parts of our treatment are adapted from [CHT , ABY ], but some elements appear to be new. For a matrix A ∈ M d , we write A for the ℓ 2 operator norm, A HS for the Hilbert-Schmidt norm, and tr A for the trace. The normalized trace is defined astr A := d −1 tr A. Nonlinear functions bind before the trace. Given a scalar function ϕ : R → R, we construct the standard matrix function ϕ : H d → H d using the eigenvalue decomposition: We constantly rely on basic tools from matrix theory; see [Car ].
Let Ω be a Polish space equipped with a probability measure µ. Define E µ and Var µ to be the expectation and variance of a real-valued function with respect to the measure µ. When applied to a random matrix, E µ computes the entrywise expectation. Nonlinear functions bind before the expectation.
. . Markov semigroups acting on matrices. This paper focuses on a special class of Markov semigroups acting on matrices. In this model, a classical Markov process drives the evolution of a matrixvalued function. Remark . mentions some generalizations.
Suppose that (Z t ) t ≥0 ⊂ Ω is a time-homogeneous Markov process on the state space Ω with stationary measure µ. For each matrix dimension d ∈ N, we can construct a Markov semigroup (P t ) t ≥0 that acts on a (bounded) measurable matrix-valued function f : Ω → H d according to for all t ≥ 0 and all z ∈ Ω.
( . ) The semigroup property P t+s = P t P s = P s P t holds for all s, t ≥ 0 because (Z t ) t ≥0 is a homogeneous Markov process.
Note that the operator P 0 is the identity map: P 0 f = f . For a fixed A ∈ H d , regarded as a constant function on Ω, the semigroup also acts as the identity: We use these facts without comment.
Although ( . ) defines a family of semigroups indexed by the matrix dimension d, we will abuse terminology and speak of this collection as if it were as single semigroup. A major theme of this paper is that facts about the action of the semigroup ( . ) on real-valued functions (d = 1) imply parallel facts about the action on matrix-valued functions (d ∈ N).
Remark . (Noncommutative semigroups). There is a very general class of noncommutative semigroups acting on a von Neumann algebra where the action is determined by a family of completely positive unital maps [JZ ]. This framework includes ( . ) as a special case; it covers quantum semigroups [CHT ] acting on H d with a fixed matrix dimension d; it also includes more exotic examples.
We will not study these models, but we will discuss the relationship between our results and prior work.
. . Ergodicity and reversibility. We say that the semigroup (P t ) t ≥0 defined in ( . ) is ergodic if P t f → E µ f as t → +∞ for all f : Ω → R. Furthermore, (P t ) t ≥0 is reversible if each operator P t is a symmetric operator on L 2 (µ). That is, ] for all t ≥ 0 and all f, g : Ω → R. ( . ) Note that these definitions involve only real-valued functions (d = 1).
In parallel, we say that the Markov process (Z t ) t ≥0 is reversible (resp. ergodic) if the associated Markov semigroup (P t ) t ≥0 is reversible (resp. ergodic). The reversibility of the process (Z t ) t ≥0 implies that, when Z 0 ∼ µ, the pair (Z t , Z 0 ) is exchangeable for all t ≥ 0. That is, (Z t , Z 0 ) and (Z 0 , Z t ) follow the same distribution for all t ≥ 0.
Our matrix concentration results require ergodicity and reversibility of the semigroup action on matrix-valued functions. These properties are actually a consequence of the analogous properties for real-valued functions. Evidently, the ergodicity of (P t ) t ≥0 is equivalent with the statement P t f → E µ f as t → +∞ for all f : Ω → H d and each d ∈ N.
( . ) A sequence of matrices converges if and only if all of the entries converge. As for reversibility, we have the following result.
Proposition . (Reversibility). Let (P t ) t ≥0 be the family of semigroups defined in ( . ). The following are equivalent.
( ) The semigroup acting on real-valued functions is symmetric, as in ( . ).
( ) The semigroup acting on matrix-valued functions is symmetric. That is, for each d ∈ N, for all t ≥ 0 and all f, g : Ω → H d .
Let us emphasize that ( . ) now involves matrix products. The proof of Proposition . appears below in Section . .
. . Convexity. Given a convex function Φ : H d → R that is bounded below, the semigroup satisfies a Jensen inequality of the form This is an easy consequence of the definition ( . ). In particular, ( . ) Because (P t ) t ≥0 is a semigroup, it follows immediately that d dt P t = LP t = P t L for all t ≥ 0.
That is, the infinitesimal generator converts an arbitrary function into a zero-mean function.
We say that the infinitesimal generator L is symmetric on L 2 (µ) when its action on real-valued functions is symmetric: (Lg)] for all f, g : Ω → R. The generator L is symmetric if and only if the semigroup (P t ) t ≥0 is symmetric (i.e., reversible). In this case, the action of L on matrix-valued functions is also symmetric: This point follows from Proposition . .
As we have alluded, the limit in ( . ) need not exist for all functions. The set of functions f : Ω → H d for which Lf is defined µ-almost everywhere is called the domain of the generator. It is highly technical, but usually unimportant, to characterize the domain of the generator and related operators.
For our purposes, we may restrict attention to an unspecified algebra of suitable functions (say, smooth and compactly supported) where all operations involving limits, derivatives, and integrals are justified. By approximation, we can extend the main results to the entire class of functions where the statements make sense. We refer the reader to the monograph [BGL ] for an extensive discussion about how to make these arguments airtight.
. . Carré du champ operator and Dirichlet form. For each d ∈ N, given the infinitesimal generator L, the matrix carré du champ operator is the bilinear form The matrix Dirichlet form is the bilinear form obtained by integrating the carré du champ: We abbreviate the associated quadratic forms as Γ( f ) := Γ( f, f ) and E( f ) := E( f, f ). Proposition .
states that both these quadratic forms are positive operators in the sense that they take values in the cone of positive-semidefinite Hermitian matrices. In many instances, the carré du champ Γ( f ) has a natural interpretation as the squared magnitude of the derivative of f , while the Dirichlet form E( f ) reflects the total energy of the function f . Using ( . ), we can rewrite the Dirichlet form as When the semigroup (P t ) t ≥0 is reversible, then ( . ) and ( . ) indicate that These alternative expressions are very useful for calculations.
. . The matrix Poincaré inequality. For each function f : Ω → H d , the matrix variance with respect to the distribution µ is defined as We say that the Markov process satisfies a matrix Poincaré inequality with constant α > 0 if This definition seems to be due to Chen et al. [CHT ]; see also Aoun et al. [ABY ].
When the matrix dimension d = 1, the inequality ( . ) reduces to the usual scalar Poincaré inequality for the semigroup. For the semigroup ( . ), the scalar Poincaré inequality (d = 1) already implies the matrix Poincaré inequality (for all d ∈ N). Therefore, to check the validity of ( . ), it suffices to consider real-valued functions.
Proposition . (Poincaré inequalities: Equivalence). For each d ∈ N, let (P t ) t ≥0 be the semigroup defined in ( . ). The following are equivalent: The proof of Proposition . appears in Section . . We are grateful to Ramon van Handel for this observation.
. . Poincaré inequalities and ergodicity. As in the scalar case, the matrix Poincaré inequality ( . ) is a powerful tool for understanding the action of a semigroup on matrix-valued functions. Assuming ergodicity, the Poincaré inequality is equivalent with the exponential convergence of the Markov semigroup (P t ) t ≥0 to the expectation operator E µ . The constant α determines the rate of convergence.
The following result makes this principle precise.  ( . ). In Section , we will use these semigroups to derive matrix concentration results for several random matrix models.
. . . Product measures. Consider a product space Ω = Ω 1 ⊗ Ω 2 ⊗ · · · ⊗ Ω n equipped with a product measure µ = µ 1 ⊗ µ 2 ⊗ · · · ⊗ µ n . In Section , we present the standard construction of the associated Markov semigroup, adapted to the matrix setting. This semigroup is ergodic and reversible, and its carré du champ operator takes the form of a discrete squared derivative: for all z ∈ Ω.
( . ) In this expression, Z = (Z 1 , . . . , Z n ) ∼ µ and (z; class of probability measures on Ω = R n that are closely related to diffusion processes. A log-concave measure takes the form dµ ∝ e −W(z) dz where the potential W : R n → R is a convex function, so it captures a form of negative dependence. The associated diffusion process naturally induces a semigroup whose carré du champ operator takes the form of the squared "magnitude" of the gradient: As usual, ∂ i := ∂/∂z i for i = 1, . . . , n.
Many interesting results follow from the condition that the potential W is uniformly strongly convex on R n . In other words, for a constant η > 0, we assume that the Hessian matrix satisfies The partial derivative ∂ ij := ∂ 2 /(∂z i ∂z j ) for i, j = 1, . . . , n. It is a standard result [BGL , Sec. . ] that the strong convexity condition ( . ) implies the scalar Bakry-Émery criterion with constant c = η −1 . Therefore, according to Proposition . , the matrix Bakry-Émery criterion ( . ) is valid for every d ∈ N. One of the core examples of a log-concave measure is the standard Gaussian measure on R n , which is given by the potential W(z) = z T z/2. The associated diffusion process induces the Ornstein-Uhlenbeck semigroup, which satisfies the Bakry-Émery criterion ( . ) with constant c = 1.
A more detailed discussion on log-concave measures is presented in Section .
. . Measures on Riemannian manifolds. The theory of diffusion processes on Euclidean spaces can be generalized to the setting of Riemannian manifolds. Although this exercise may seem abstract, it allows us to treat some interesting and important examples in a unified way. We refer to [BGL ] for more background on this subject, and we instate their conventions.
Consider an n-dimensional compact Riemannian manifold (M, g). Let g(x) = (g ij (x) : 1 ≤ i, j ≤ n) be the matrix representation of the co-metric tensor g in local coordinates, which is a symmetric and positive-definite matrix defined for every x ∈ M. The manifold is equipped with a canonical Riemannian probability measure µ g that has local density dµ g ∝ det(g(x)) −1/2 dx with respect to the Lebesgue measure in local coordinates. This measure µ g is the stationary measure of the diffusion process on M whose infinitesimal generator L is the Laplace-Beltrami operator ∆ g . This diffusion process is called the Riemannian Brownian motion.¹ The associated matrix carré du champ operator coincides with the squared "magnitude" of the differential: Here, ∂ i for i = 1, . . . , n are the components of the differential, computed in local coordinates. We emphasize that the matrix carré du champ operator is intrinsic; expressions for the carré du champ resulting from different choices of local coordinates are equivalent under change of variables. See Section for a more detailed discussion.
As mentioned in Remark . , the scalar Bakry-Émery criterion holds with c = ρ −1 if and only if the Ricci curvature tensor of (M, g) is everywhere positive, with eigenvalues bounded from below by ρ > 0. In other words, for Brownian motion on a manifold, the Bakry-Émery criterion is equivalent to the uniform positive curvature of the manifold. Proposition . ensures that the matrix Bakry-Émery criterion ( . ) holds with c = ρ −1 under precisely the same circumstances.
Many examples of positively curved Riemannian manifolds are discussed in [Led , Gro , CE , BGL ]. We highlight two particularly interesting cases.
Example . (Unit sphere). Consider the n-dimensional unit sphere S n ⊂ R n+1 for n ≥ 2. The sphere is equipped with the Riemannian manifold structure induced by R n+1 . The canonical Riemannian measure on the sphere is simply the uniform probability measure. The sphere has a constant Ricci curvature tensor, whose eigenvalues all equal n − 1. Therefore, the Brownian motion on S n satisfies the Bakry-Émery criterion ( . ) with c = (n − 1) −1 . See [BGL , Sec. . ].
Example . (Special orthogonal group). The special orthogonal group SO(n) can be regarded as a Riemannian submanifold of R n×n . The Riemannian metric is the Haar probability measure on SO(n). It is known that the eigenvalues of the Ricci curvature tensor are uniformly bounded below by (n − 1)/4. Therefore, the Brownian motion on SO(n) satisfies the Bakry-Émery criterion ( . ) with c = 4/(n − 1). See [Led , pp. ff].
The lower bound on Ricci curvature is stable under (Riemannian) products of manifolds, so similar results are valid for products of spheres or products of the orthogonal group; cf. [Led , p. ].
. . History. In the scalar setting, much of the classic research on Markov processes concerns the behavior of diffusion processes on Riemannian manifolds. Functional inequalities connect the convergence of these Markov processes to the geometry of the manifold. The rate of convergence to equilibrium of a Markov process plays a core role in developing concentration properties for the measure. The treatise [BGL ] contains a comprehensive discussion. Other references include [Led , BLM , vH ].
Matrix-valued Markov processes were originally introduced to model the evolution of quantum systems [Dav , Lin , AFL ]. In recent years, the long-term behavior of quantum Markov processes has received significant attention in the field of quantum information. A general approach to exponential convergence of a quantum system is to establish quantum log-Sobolev inequalities for density operators [MOZ , OZ , KT ].
In this paper, we consider a mixed classical-quantum setting, where a classical Markov process drives a matrix-valued function. The papers [CHT , CH , ABY ] contain some foundational results for this model. Our work provides a more detailed understanding of the connections between the ergodicity of the semigroup and matrix functional inequalities. The companion paper [HT ] contains further results on trace Poincaré inequalities, which are equivalent to the Poincaré inequality ( . ).
A general framework for noncommutative diffusion processes on von Neumann algebras can be found in [JLMX , JZ ]. In particular, the paper [JZ ] shows that a noncommutative Bakry-Émery criterion implies local ergodicity of a noncommutative diffusion process.
¹Many authors use the convention that Riemmanian Brownian motion has infinitesimal generator 1 In spite of its generality, the presentation in [JZ ] does not fully contain our treatment. On the one hand, the noncommutative semigroup model includes the mixed classical-quantum model ( . ) as a special case. On the other hand, we do not need the underlying Markov process to be a diffusion (with continuous sample paths), while Junge & Zeng pose a diffusion assumption.
. N M C : M R The matrix Poincaré inequality ( . ) has been associated with subexponential concentration inequalities for random matrices [ABY , HT ]. The central purpose of this paper is to establish that the (scalar) Bakry-Émery criterion leads to matrix concentration inequalities via a straightforward semigroup method. This section outlines our main results; the proofs appear in Section .
Remark . (Noncommutative setting). After this paper was written, we learned that Junge & Zeng [JZ ] have used the (noncommutative) Bakry-Émery criterion to obtain subgaussian moment bounds for elements of von Neumann algebra using a martingale approach. Their setting is more general (if we ignore the diffusion assumptions), but we will see that their results are weaker in several respects.
. . Markov processes and random matrices. Let Z be a random variable, taking values in the state space Ω, with the distribution µ. For a matrix-valued function f : Ω → H d , we can define the random matrix f (Z), whose distribution is the push-forward of µ by the function f . Our goal is to understand how well the random matrix f (Z) concentrates around its expectation E f (Z) = E µ f .
To do so, suppose that we can construct a reversible, ergodic Markov process (Z t ) t ≥0 ⊂ Ω whose stationary distribution is µ. We have the intuition that the faster that the process (Z t ) t ≥0 converges to equilibrium, the more sharply the random matrix f (Z) concentrates around its expectation.
To quantify the rate of convergence of the matrix Markov process, we use the Bakry-Émery criterion ( . ) to obtain local ergodicity of the semigroup. This property allows us to prove strong bounds on the trace moments of the random matrix. Using standard arguments (Appendix A), these moment bounds imply nonlinear matrix concentration inequalities.
. . Polynomial concentration. We begin with a general estimate on the polynomial trace moments of a random matrix under a Bakry-Émery criterion.

Theorem . (Polynomial moments).
Let Ω be a Polish space equipped with a probability measure µ. Consider a reversible, ergodic Markov semigroup ( . ) with stationary measure µ that acts on (suitable) functions f : Ω → H d . Assume that the Bakry-Émery criterion ( . ) holds for a constant c > 0. Then, for q = 1 and q ≥ 1.5,

If the variance proxy
We establish this theorem in Section .
For noncommutative diffusion semigroups, Junge & Zeng [JZ ] have developed polynomial moment bounds similar to Theorem . , but they only obtain moment growth of O(q) in the inequality ( . ). We can trace this discrepancy to the fact that they use a martingale argument based on the noncommutative Burkholder-Davis-Gundy inequality. At present, our proof only applies to the mixed classical-quantum semigroup ( . ), but it seems plausible that our approach can be generalized.
For now, let us present some concrete results that follow when we apply Theorem . to the semigroups discussed in Section . . In each of these cases, we can derive bounds for the expectation and tails of f − E µ f using the matrix Chebyshev inequality (Proposition A. ). In particular, when f < +∞, we obtain subgaussian concentration.
. . . Polynomial Efron-Stein inequality for product measures. The first consequence of Theorem . is a polynomial moment inequality for product measures. This result exactly reproduces the matrix polynomial Efron-Stein inequalities established by Paulin et al. [PMT , Theorem . ].
Corollary . (Product measure: Polynomial moments). Let µ = µ 1 ⊗ µ 2 ⊗ · · · ⊗ µ n be a product measure on a product space Ω = Ω 1 ⊗ Ω 2 ⊗ · · · ⊗ Ω n . Let f : Ω → H d be a suitable function. Then, for q = 1 and q ≥ 1.5, The details appear in Section . . . . . Log-concave measures. The second result is a new polynomial moment inequality for matrixvalued functions of a log-concave measure. To avoid domain issues, we restrict our attention to the Sobolev space For these functions, we have the following matrix concentration inequality.
Corollary . (Log-concave measure: Polynomial moments). Let dµ ∝ e −W(z) dz be a log-concave measure on R n whose potential W : R n → R satisfies a uniform strong convexity condition: Hess W η · I n with constant η > 0. Let f ∈ H 2, µ (R n ; H d ). Then, for q = 1 and q ≥ 1.5, The details appear in Section . .
. . Exponential concentration. As a consequence of the Bakry-Émery criterion ( . ), we can also derive exponential matrix concentration inequalities. In principle, polynomial moment inequalities are stronger, but the exponential inequalities often lead to better constants and more detailed information about tail decay.

Theorem . (Exponential concentration).
Let Ω be a Polish space equipped with a probability measure µ. Consider a reversible, ergodic Markov semigroup with stationary measure µ that acts on (suitable) functions f : Ω → H d . Assume that the Bakry-Émery criterion ( . ) holds for a constant c > 0. Then The function r f computes an exponential mean of the carré du champ: In addition, suppose that the variance proxy f :

Parallel inequalities hold for the minimum eigenvalue λ min .
We establish Theorem . in Section . as a consequence of an exponential moment inequality, Theorem . , for random matrices. By combining Theorem . with the examples in Section . , we obtain concentration results for concrete random matrix models.
A partial version of Theorem . with slightly worse constants appears in [JZ , Corollary . ]. When comparing these results, note that probability measure in [JZ ] is normalized to absorb the dimensional factor d.
. . . Exponential Efron-Stein inequality for product measures. We can reproduce the matrix exponential Efron-Stein inequalities of Paulin et al. [PMT , Theorem . ] by applying Theorem . to a product measure (Section . . ). For instance, we obtain the following subgaussian inequality.
We defer the proof to Section . . . . . Log-concave measures. We can also obtain exponential concentration for a matrix-valued function of a log-concave measure by combining Theorem . with the results in Section . . .

Corollary .
(Log-concave measure: Subgaussian concentration). Let dµ ∝ e −W(z) dz be a log-concave probability measure on R n whose potential W : R n → R satisfies a uniform strong convexity condition: Hess W η · I n where η > 0. Let f ∈ H 2, µ (R n ; H d ), and define the variance proxy Furthermore, See Section . for the proof.
Example . (Matrix Gaussian series). Consider the standard normal measure γ n on R n . Its potential, W(z) = z T z/2, is uniformly strongly convex with parameter η = 1. Therefore, Corollary . gives subgaussian concentration for matrix-valued functions of a Gaussian random vector. To make a comparison with familiar results, we construct the matrix Gaussian series In this case, the carré du champ is simply Thus, the expectation bound states that Up to and including the constants, this matches the sharp bound that follows from "linear" matrix concentration techniques [Tro , Chapter ].
Van Handel (private communication) has outlined out an alternative proof of Corollary . with slightly worse constants. His approach uses Pisier's method [Pis , Thm. . ] and the noncommutative Khintchine inequality [Buc ] to obtain the statement for the standard normal measure. Then Caffarelli's contraction theorem [Caf ] implies that the same bound holds for every log-concave measure whose potential is uniformly strongly convex with η ≥ 1. This approach is short and conceptual, but it is more limited in scope.
. . Riemannian measures. As discussed in Section . , the Brownian motion on a Riemannian manifold with uniformly positive curvature satisfies the Bakry-Émery criterion ( . ). Therefore, we can apply both Theorem . and Theorem . in this setting. Let us give a few concrete examples of the kind of results that can be derived with these methods.
. . . The sphere. Consider the uniform distribution σ n on the n-dimensional unit sphere S n ⊂ R n+1 for n ≥ 2. The Brownian motion on the sphere satisfies the Bakry-Émery criterion ( . ) with c = (n − 1) −1 . Therefore, Theorem . implies that, for any suitable function f : S n → H d , where the carré du champ Γ( f ) is defined by ( . ). We can also obtain subgaussian tail bounds in terms of the variance proxy f := Γ( f ) L ∞ (σ n ) . Indeed, Theorem . yields the bound To use these concentration inequalities, we need to compute the carré du champ Γ( f ) and bound the variance proxy f for particular functions f . We give two illustrations, postponing the detailed calculations to Section . . In each case, let We can compute the carré du champ as Compare this calculation with Example . , where the coefficients follow the standard normal distribution. For the sphere, the carré du champ operator is smaller because a finite-dimensional sphere has slightly more curvature than the Gauss space.

Example . (Sphere II). Consider the random matrix
A simple bound shows that the variance proxy f ≤ 2 max i, j A i − A j . It is possible to make further improvements in some cases.
. . . The special orthogonal group. The Riemannian manifold framework also encompasses matrixvalued functions of random orthogonal matrices. For instance, suppose that O 1 , . . . , O n ⊂ SO(d) are drawn independently and uniformly from the Haar measure µ on the special orthogonal group SO(d).
As discussed in Section . , the Brownian motion on the product space satisfies the Bakry-Émery Here is a particular example where we can bound the variance proxy.
The details of the calculation appear in Section . .
. . Extension to general rectangular matrices. By a standard formal argument, we can extend the results in this section to a function h : Ω → M d 1 ×d 2 that takes rectangular matrix values. To do so, we simply apply the theorems to the self-adjoint dilation See [Tro ] for many examples of this methodology.
. . History. Matrix concentration inequalities are noncommutative extensions of their scalar counterparts. They have been studied extensively, and they have had a profound impact on a wide range of areas in computational mathematics and statistics. The models for which the most complete results are available include a sum of independent random matrices [LP , Rud , Oli , Tro , Hua ] and a matrix-valued martingale sequence [PX , Oli , Tro , JZ , HRMS ]. We refer to the monograph [Tro ] for an introduction and an extensive bibliography. Very recently, some concentration results for products of random matrices have also been established [HW , HNWTW ].
In recent years, many authors have sought concentration results for more general random matrix models. One natural idea is to develop matrix versions of scalar concentration techniques based on functional inequalities or based on Markov processes.
In the scalar setting, the subadditivity of the entropy plays a basic role in obtaining modified log-Sobolev inequalities for product spaces, a core ingredient in proving subgaussian concentration results.  [PMT ] can be viewed as a discrete version of the semigroup approach that we use in this paper; see Appendix C for more discussion.
Very recently, Aoun et al. [ABY ] showed how to derive exponential matrix concentration inequalities from the matrix Poincaré inequality ( . ). Their approach is based on the classic iterative argument, due to Aida & Stroock [AS ], that operates in the scalar setting. For matrices, it takes serious effort to implement this technique. In our companion paper [HT ], we have shown that a trace Poincaré inequality leads to stronger exponential concentration results via an easier argument.
Another appealing contribution of the paper [ABY ] is to establish the validity of a matrix Poincaré inequality for particular matrix-valued Markov processes. Unfortunately, Poincaré inequalities are apparently not strong enough to capture subgaussian concentration. In the scalar case, log-Sobolev inequalities lead to subgaussian concentration inequalities. At present, it is not clear how to extend the theory of log-Sobolev inequalities to matrices, and this obstacle has delayed progress on studying matrix concentration via functional inequalities.
In the scalar setting, one common technique for establishing a log-Sobolev inequality is to prove that the Bakry-Émery criterion holds [vH , Problem . ]. Inspired by this observation, we have chosen to investigate the implications of the Bakry-Émery criterion ( . ) for Markov semigroups acting on matrix-valued functions. Our work demonstrates that this type of curvature condition allows us to establish matrix moment bounds directly, without the intermediation of a log-Sobolev inequality. As a consequence, we can obtain subgaussian and subgamma concentration for nonlinear random matrix models.
After establishing the results in this paper, we discovered that Junge & Zeng [JZ ] have also derived subgaussian matrix concentration inequalities from the (noncommutative) Bakry-Émery criterion. Their approach is based on a noncommutative version of the Burkholder-Davis-Gundy inequality and a martingale argument that applies to a wider class of noncommutative diffusion semigroups acting on von Neumann algebras. As a consequence, their results apply to a larger family of examples, but the moment growth bounds are somewhat worse.
In contrast, our paper develops a direct argument for the mixed classical-quantum semigroup ( . ) that does not require any sophisticated tools from operator theory or noncommutative probability. Instead, we establish a new trace inequality (Lemma . ) that mimics the chain rule for a scalar diffusion semigroup.
( ) For all suitable f, g : Ω → H d and all s > 0,

Similar results hold for the matrix Dirichlet form, owing to the definition ( . ).
Proof. Proof of ( ). The limit form of the carré du champ can be verified with a short calculation: The first relation depends on the definition ( . ) of Γ and the definition ( . ) of L.
Proof of ( ). The fact that f → Γ( f ) is positive follows from ( ) because the square of a matrix is positive-semidefinite and the expectation preserves positivity.
Proof of ( ). The Young inequality for the carré du champ follows from the fact that Γ is positive: The second relation holds because Γ is a bilinear form.
Proof of ( ). To establish operator convexity, we use bilinearity again: The next lemma is an extension of Proposition . ( ). We use this result to establish the all-important chain rule inequality in Section .
Lemma . (Triple product). Let (Z t ) t ≥0 be a reversible Markov process with a stationary measure µ and infinitesimal generator L. For all suitable f, g, h : Ω → H d and all z ∈ Ω, In particular, Proof. For simplicity, we abbreviate We have applied the cyclic property of the trace. Using the reversibility ( . ) of the Markov process and the zero-mean property ( . ) of the infinitesimal generator, we have This concludes the second part of the lemma.
. . Reversibility. In this section, we establish Proposition . , which states that reversibility of the semigroup ( . ) on real-valued functions is equivalent with the reversibility of the semigroup on matrix-valued functions. The pattern of argument was suggested to us by Ramon van Handel, and it will be repeated below in the proofs that certain functional inequalities for real-valued functions are equivalent with functional inequalities for matrix-valued functions.
Proof of Proposition . . The implication that matrix reversibility ( . ) for all d ∈ N implies scalar reversibility is obvious: just take d = 1. To check the converse, we require an elementary identity. For all vectors u, ∈ C d and all matrices A, B ∈ H d , We have defined a j := u * Ae j and b j := * Ae j for each j = 1, . . . , d. As usual, (e j : 1 ≤ j ≤ d) is the standard basis for C d . Now, consider two matrix-valued functions f, g : Ω → H d . Introduce the scalar functions f j := u * f e j and g j := * ge j for each j = 1, . . . , d. The definition ( . ) of the semigroup (P t ) t ≥0 as an expectation ensures that The parallel statement holds for * (P t g)e j . Therefore, we can use formula ( . ) to compute that The matrix identity ( . ) follows immediately because u, ∈ C d are arbitrary.
. . Dimension reduction. The following lemma explains how to relate the carré du champ operator of a matrix-valued function to the carré du champ operators of some scalar functions. It will help us transform the scalar Poincaré inequality and the scalar Bakry-Émery criterion to their matrix equivalents.
Lemma . (Dimension reduction of carré du champ). Let (P t ) t ≥0 be the semigroup defined in ( . ). The carré du champ operator Γ and the iterated carré du champ operator Γ 2 satisfy These formulae hold for all d ∈ N, for all suitable functions f : Ω → H d , and for all vectors u ∈ C d .
Proof. The definition ( . ) of L implies that Introduce the scalar function f j := u * f e j for each j = 1, . . . , d. Then we can use the definition ( . ) of Γ and formula ( . ) to compute that This is the first identity ( . ). The second identity ( . ) follows from a similar argument based on the definition ( . ) of Γ 2 and the relation ( . ).
. . Equivalence of scalar and matrix inequalities. In this section, we verify Proposition . and Proposition . . These results state that functional inequalities for the action of the semigroup ( . ) on real-valued functions induce functional inequalities for its action on matrix-valued functions.
Proof of Proposition . . It is evident that the validity of the matrix Poincaré inequality ( ) for all d ∈ N implies the scalar Poincaré inequality ( ), which is simply the d = 1 case. For the reverse implication, we invoke formula ( . ) to learn that Moreover, we can take the expectation E µ of formula ( . ) to obtain Applying the scalar Poincaré inequality ( ) to the real scalar functions Re(u * f e j ) and Im(u * f e j ), we This immediately implies the matrix Poincaré inequality ( ).
Proof of Proposition . . It is evident that the validity of the matrix Bakry-Émery criterion ( ) for all d ∈ N implies the validity of the scalar criterion ( ), as we only need to set d = 1. To develop the reverse implication, we use Lemma . to compute that The inequality is applying ( ) to real scalar functions Re(u * f e j ) and Im(u * f e j ) for each j = 1, . . . , d.
Since u ∈ C d is arbitrary, we immediately obtain ( ).
. . Derivative formulas. A standard way to establish the equivalence between the Poincaré inequality and the exponential ergodicity property is by studying derivatives with respect to the time parameter t. The following result, extending [ABY , Lemma . ], calculates the derivatives of the matrix variance and the Dirichlet form along the semigroup (P t ) t ≥0 . The result parallels the scalar case.
Lemma . (Dissipation of variance and energy). Let (P t ) t ≥0 be a Markov semigroup with stationary measure µ, infinitesimal generator L, and Dirichlet form E. For all suitable f : Ω → H d , Proof. By the definition ( . ) of the matrix variance and the stationarity property E µ P t = E µ , we can calculate that d dt The second equality above uses the derivative relation ( . ) for the generator, and the third equality is the expression ( . ) for the Dirichlet form. Similarly, we can calculate that d dt The first equality is ( . ). The last equality holds because L is symmetric.
The matrix Poincaré inequality ( . ) allows us to convert the derivative formulas in Lemma . into differential inequalities for matrix-valued functions. The next lemma gives the solution to these differential inequalities. Since integration preserves the semidefinite order, Multiply by e νt to arrive at the stated result.

Lemma . (Differential matrix inequality). Assume that
. . Consequences of the Poincaré inequality. This section contains the proof of Proposition . , the equivalence between the matrix Poincaré inequality and exponential ergodicity properties. This proof is adapted from its scalar analog [vH , Theorem . ].
Proof of Proposition . . Proof that ( ) ⇒ ( ). To see that the matrix Poincaré inequality ( ) implies exponential ergodicity ( ) of the variance, combine Lemma . with the matrix Poincaré inequality to obtain a differential inequality: Lemma . gives the solution: This is the ergodicity of the variance.
Proof that ( ) ⇒ ( ). To obtain the matrix Poincaré inequality ( ) from exponential ergodicity ( ) of the variance, use the derivative ( . ) of the variance and the fact that P 0 is the identity map to see that The inequality follows from ( ).
Proof that ( ) ⇒ ( ) under reversibility. Next, we argue that the matrix Poincaré inequality ( ) implies exponential ergodicity ( ) of the energy, assuming that the semigroup is reversible. In this case, the zero-mean property ( .
for all suitable f, g. Therefore, Lemma . gives the solution to the differential inequality: This is the ergodicity of energy.
Proof that ( ) ⇒ ( ) under ergodicity. To see that exponential ergodicity ( ) of the energy implies the matrix Poincaré inequality ( ) when the semigroup is ergodic, we combine ( ) with the derivative ( . ) of the Dirichlet form to obtain d dt Using the ergodicity assumption ( . ) on the semigroup, we have The inequality follows from ( ). Apply Lemma . to reach to bound A(t) e −2t/c A(0). This yields Proof that ( ) ⇒ ( ). Next, we argue that local ergodicity of the carré du champ operator ( ) implies the local matrix Poincaré inequality ( ). Construct the function B(s) : Taking the derivative with respect to s gives d ds This is the local ergodicity property.

Proof that ( ) ⇒ ( ). Last, we show that the local matrix Poincaré inequality ( ) implies the matrix Bakry-Émery criterion ( ). Construct the function C(t)
Evidently, C(0) = 0, and the local Poincaré inequality ( ) implies that C(t) 0 for all t ≥ 0. Now, the first derivative satisfies d dt t=0 The second derivative takes the form d 2 dt 2 t=0

D. HUANG AND J. A. TROPP
Therefore, This verifies the validity of the matrix Bakry-Émery criterion with constant c.

. F
The main results of this paper, Theorems . and . , demonstrate that the Bakry-Émery criterion ( . ) leads to trace moment inequalities for random matrices. This section is dedicated to the proofs of these theorems. These arguments appear to be new, even in the scalar setting, but see [Led , Sch ] for some precedents.
. . Overview. Let (P t ) t ≥0 be a reversible, ergodic semigroup acting on matrix-valued functions.
Assume that the semigroup satisfies a Bakry-Émery criterion ( . ), so Proposition . implies that it is locally ergodic. Without loss of generality, we may assume that the matrix-valued function f is zero-mean: E µ f = 0.
For a standard matrix function ϕ, the basic idea is to estimate a trace moment of the form E µ tr[ f ϕ( f )] via a classic semigroup argument: In the second term on the right-hand side, the time derivative places the infinitesimal generator L in the integrand, which then becomes This familiar formula is the starting point for our method.
To control the trace of the carré du champ, we employ the following fundamental lemma, which is related to the Stroock-Varopoulos inequality [Str , Var ].
Lemma . (Chain rule inequality). Let ϕ : R → R be a function such that ψ := |ϕ ′ | is convex. For all suitable f, g : Ω → H d , The proof of this lemma appears below in Section . . Lemma . isolates the contributions from the matrix P t f and the matrix ϕ( f ) in the formula ( . ). To estimate Γ(P t f ), we invoke the local ergodicity property, Proposition . ( ). Last, we apply the matrix decoupling techniques, based on Hölder and Young trace inequalities, to bound E tr [Γ( f ) ψ( f )] and E tr [Γ(P t f ) ψ( f )] in terms of the original quantity of interest E µ tr[ f ϕ( f )]. The following sections supply full details.
Our approach incorporates some techniques and ideas from [PMT , Theorems . and . ], but the argument is distinct. Appendix C gives more details about the connection.
. . Proof of chain rule inequality. To prove Lemma . , we require a novel trace inequality. Proof of Lemma . from Lemma . . For simplicity, we abbreviate Fix a parameter s > 0. For each t > 0, the mean value trace inequality, Lemma . , yields It follows from the triple product result, Lemma . , that the second term satisfies Sequence the displays ( . ),( . ) and ( . ) to reach The last relation is Proposition . ( ). Minimize the right-hand side over s ∈ (0, ∞) to arrive at This completes the proof of Lemma . .
. . Polynomial moments. This section is dedicated to the proof of Theorem . , which states that the Bakry-Émery criterion implies matrix polynomial moment bounds.

Remark . (Missing powers).
A similar argument holds when q ∈ (1, 1.5). It requires a variant of Lemma . that holds for monotone ψ, but has an extra factor of 2 on the right-hand side.
. . . A Markov semigroup argument. By the ergodicity assumption ( . ), it holds that P ∞ f = E µ f = 0. Therefore, By convexity of ψ, we can invoke the chain rule inequality, Lemma . , to obtain The last inequality is the local ergodicity condition, Proposition . ( ).
. . . Decoupling. Apply Hölder's inequality for the trace followed by Hölder's inequality for the expectation to obtain Introduce the bounds ( . ) into ( . ) to find that It remains to remove the semigroup from the integral. .

This establishes ( . ).
Define the uniform bound f := Γ( f ) L ∞ (µ) . We have the further estimate The statement ( . ) now follows from ( . ). This step completes the proof of Theorem . .
. . Exponential moments. In this section, we establish Theorem . , the exponential matrix concentration inequality. The main technical ingredient is a bound on exponential moments:
Choose a suitable function f : Ω → H d . We may assume that E µ f = 0. Furthermore, we only need to consider the case θ ≥ 0. The results for θ < 0 follow formally under the change of variables θ → −θ and f → − f .
The quantity of interest is the normalized trace mgf: We will bound the derivative of this function: We have introduced the function ϕ : x → e θx for x ∈ R. Note that its absolute derivative ψ(x) := |ϕ ′ (x)| = θe θx is a convex function, since θ ≥ 0. Here and elsewhere, we use the properties of the trace mgf that are collected in Lemma A. . . . . A Markov semigroup argument. By the ergodicity assumption ( . ), we have Invoke the chain rule inequality, Lemma . , to obtain The second inequality is the local ergodicity condition, Proposition . ( ).
. . . Decoupling. The next step is to use an entropy inequality to separate the carré du champ operator in ( . ) from the matrix exponential. The following trace inequality appears as [MJC + , Proposition A. ]; see also [Car , Theorem . ].
The trace exponentialtr exp(·) is operator convex; see [Car , Theorem . ]. The Jensen inequality ( . ) for the semigroup implies that Combine the last two displays to obtain Thus, the two terms on the right-hand side of ( . ) have matching bounds. Sequence the displays ( . ), ( . ), and ( . ) to reach This is the integrand in ( . ). Next, we simplify this expression to arrive at a differential inequality.
. . . A differential inequality. In view of Proposition A. (A. ), we have log m(θ) ≥ 0 and hence Substitute this bound into ( . ) and compute the integral to arrive at the differential inequality Finally, we need to solve for the trace mgf.
Moreover, it is easy to check that r( β) ≤ f . Since this bound is independent of β, we can take β → +∞ in ( . ) to achieve ( . ). This completes the proof of Theorem . .
. . Exponential matrix concentration. We are now ready to prove Theorem . , the exponential matrix concentration inequality, as a consequence of the moment bounds of Theorem . . To do so, we use the standard matrix Laplace transform method, summarized in Appendix A.
Proof of Theorem . from Theorem . . To obtain inequalities for the maximum eigenvalue λ max , we apply Proposition A. to the random matrix X = f (Z) − E µ f where Z ∼ µ. To do so, we first need to weaken the moment bound ( . ): Then substitute c 1 = cr( β) and c 2 = c/ β into Proposition A. to achieve the results stated in Theorem . .
To obtain bounds for the minimum eigenvalue λ min , we apply Proposition A. instead to the random matrix In this section, we introduce the classic Markov process for a product measure. We check the Bakry-Émery criterion for this Markov process, which leads to matrix concentration results for product measures.
. . Product measures and Markov processes. Consider a product space Ω = Ω 1 ⊗ Ω 2 ⊗ · · · ⊗ Ω n equipped with a product measure µ = µ 1 ⊗ µ 2 ⊗ · · · ⊗ µ n . We can construct a Markov process be a sequence of independent Poisson processes. Whenever N i t increases for some i, we replace the value of Z i t in Z t by an independent sample from µ i while keeping the remaining coordinates fixed.
To describe the Markov semigroup associated with this Markov process, we need some notation.
Let Z = (Z 1 , Z 2 , . . . , Z n ) ∈ Ω be a random vector drawn from the measure µ; that is, each coordinate Z i ∈ Ω i is drawn independently from the measure µ i . Through this section, we write E Z := E Z∼µ . The Markov semigroup (P t ) t ≥0 induced by the Markov process is given by (1 − e −t ) |I | e −t(n−|I |) · E Z f (z; Z) I for all z ∈ Ω.
The infinitesimal generator L of the semigroup admits the explicit form The difference operator δ i is given by This infinitesimal generator L is well defined for all integrable functions, so the class of suitable functions contains L 1 (µ). It follows from the definition of δ i that Thus, the infinitesimal generator L is symmetric on L 2 (µ). As a consequence, the semigroup is reversible, and the Dirichlet form is given by for any f, g : Ω → H d , whereZ is an independent copy of Z. All the results above and their proofs can be found in [vH , ABY ].
. . Carré du champ operators. The following lemma gives the formulas for the matrix carré du champ operator and the iterated matrix carré du champ operator.
Lemma . (Product measure: Carré du champs). The matrix carré du champ operator Γ and the iterated matrix carré du champ operator Γ 2 of the semigroup ( . ) are given by the formulas These expressions are valid for all suitable f, g : Ω → H d and all z ∈ Ω. The random variables Z andZ are independent draws from the measure µ.
Theorem . (Product measure: Bakry-Émery). For the semigroup ( . ), the Bakry-Émery criterion ( . ) holds with c = 2. That is, for any suitable function f : Ω → R, Proof. Comparing the two expressions in Lemma . with f = g gives which is the stated inequality.
After completing this paper, we learned that Theorem . appears in [JZ , Example . ] with a different style of proof.
Remark . (Matrix Poincaré inequality: Constants). Following the discussion in Section . , Theorem . implies the matrix Poincaré inequality ( . ) with α = 2. However, Aoun et al. [ABY ] proved that the Markov process ( . ) actually satisfies the matrix Poincaré inequality with α = 1; see also [CH , Theorem . ]. This gap is not surprising because the averaging operation that is missing in the local Poincaré inequality contributes to the global convergence of the Markov semigroup.
. . Matrix concentration results. In this subsection, we complete the proofs of the matrix concentration results for product measures stated in Section .
For a product measure µ = µ 1 ⊗ µ 2 ⊗ · · · ⊗ µ n , Theorem . shows that there is a reversible ergodic Markov semigroup whose stationary measure is µ and which satisfies the Bakry-Émery criterion ( . ) with constant c = 2. We then apply Theorem . with c = 2 to obtain the polynomial moment bounds in Corollary . . Similarly, we apply Theorem . with c = 2 to obtain the subgaussian concentration inequalities in Corollary . .

. B -É -
In this section, we study a class of log-concave measures; the most important example in this class is the standard Gaussian measure. First, we introduce the standard diffusion process associated with a log-concave measure. We verify that the associated semigroup is reversible and ergodic via standard arguments. Then we introduce the Bakry-Émery criterion which follows from the uniform strong convexity of the potential.
. . Log-concave measures and Markov processes. Consider the Markov processes (Z t ) t ≥0 on R n generated by the stochastic differential equation: where B t is the standard n-dimensional Brownian motion and W : R n → R is a smooth convex function. The stationary measure µ of this process has the density dµ = ρ ∞ (z) dz = B −1 e −W(z) dz, where B := ∫ R n e −W(z) dz is a normalization constant. The infinitesimal generator L is given by The class of suitable functions is the Sobolev space H 2, µ (R n ; H d ), defined in ( . ). Here and elsewhere, ∂ i means ∂/∂z i and ∂ ij means ∂ 2 /(∂z i ∂z j ) for all i, j = 1, . . . , n.
. . . Reversibility. The reversibility of this Markov (Z t ) t ≥0 can be verified with a standard calculation. We restrict our attention to functions in H 2, µ (R n ; H d ). Integration by parts yields ]. This shows that L is symmetric on L 2 (µ) and thus (Z t ) t ≥0 is reversible. From the calculation above, we also obtain a simple formula for the associated Dirichlet form: These results parallel the scalar case, but the partial derivatives are matrix-valued.
. . . Ergodicity. We now turn to the ergodicity of the Markov process given by ( . ), which generally reduces to studying the convergence of the corresponding Fokker-Planck equation: We define ρ x (z, t) to be the density of Z t , conditional on Z 0 = x ∈ R n . As usual, δ(z − x) is the Dirac distribution centered at x. The associated Markov semigroup (P t ) t ≥0 can be recognized as The semigroup (P t ) t ≥0 is ergodic in the sense of ( . ) if and only if ρ x (·, t) converges weakly to ρ ∞ for all x ∈ R n .
A fundamental way to prove the convergence of ( . ) to the stationary density ρ ∞ is through the method of Lyapunov functions [Hai , JSY ]. However, ergodicity in the weak sense follows more easily from the assumption that the function W is uniformly strongly convex. That is, η · I n for all z ∈ R n .
To see this, recall the Brascamp-Lieb inequality [BL , Theorem . ], which states that the (ordinary) variance of a scalar function h : R n → R is bounded as Combine the last two displays to arrive at the Poincaré inequality Var µ [h] ≤ η −1 E(h). Next, consider the scalar function ϕ x (z, t) := (ρ x (z, t)− ρ ∞ (z))/ρ ∞ (z). Let us check that its variance Var µ [ϕ x (·, t)] converges to 0 exponentially fast. Indeed, it is not hard to verify that ϕ x (z, t) satisfies the partial differential equation Along with the Poincaré inequality and the fact that E µ ϕ x (·, t) = 0, this implies d dt Therefore, the quantity Var µ [ϕ x (·, t)] converges to 0 exponentially fast because As a consequence, for any f ∈ H 2, µ (R n ; R) and any x ∈ R n , This justifies the pointwise convergence of P t f and the ergodicity ( . ) of the semigroup (P t ) t ≥0 .
. . Carré du champ operators. After checking reversibility and ergodicity, we now turn to the derivation of the matrix carré du champ operator and the iterated matrix carré du champ operator. Their explicit forms are given in the next lemma.
Lemma . (Log-concave measure: Carré du champs). The matrix carré du champ operator Γ and the iterated matrix carré du champ operator Γ 2 of the Markov process defined by ( . ) are given by the and for all suitable f, g : R n → H d .
Proof of Lemma . . Knowing the explicit form ( . ) of the Markov generator L, we can compute the carré du champ operator Γ as Moreover, combining the expressions ( . ) and ( . ) yields the following: Then we can compute that This gives the expression ( . ).
. . Bakry-Émery criterion. It is a well-known result that a Bakry-Émery criterion follows from the uniform strong convexity of W. For example, see the discussion in [BGL , Sec. . ]. Nevertheless, we provide a short proof here for the sake of completeness.

Fact . (Log-concave measure: Matrix Bakry-Émery). Consider the Markov process defined by ( . ).
If the potential W : R → R satisfies (Hess W)(z) η · I n for all z ∈ R n for some constant η > 0, then the Bakry-Émery criterion ( . ) holds with c = η −1 . That is, for any suitable function f : R n → R, Proof. Comparing the two expressions in Lemma . with f = g gives that The second inequality follows from the uniform strong convexity of W. Proposition . extends the scalar Bakry-Émery criterion to matrices.
. . Standard normal distribution. The most important example of a strongly log-concave measure occurs for the potential In this case, the corresponding log-concave measure µ coincides with the density of the n-dimensional standard Gaussian distribution N(0, I n ): The associated Markov process is known as the Ornstein-Uhlenbeck process. The semigroup (P t ) t ≥0 has a simple form, given by the Mehler formula: The ergodicity of this Markov semigroup is obvious from the above formula because e −t → 0 as t → +∞. Lemma . gives the matrix carré du champ operator Γ and the iterated matrix carré du champ operator Γ 2 for the Ornstein-Uhlenbeck process: Clearly, Γ( f ) Γ 2 ( f ). Therefore, the Bakry-Émery criterion ( . ) holds with c = 1.
. . Matrix concentration results. Finally, we prove the matrix concentration results for log-concave measures stated in Section .
Consider a log-concave probability measure dµ ∝ e −W(z) dz on R n , where the potential satisfies the strong convexity condition Hess W ηI n for η > 0. Fact . states that the associated semigroup ( . ) satisfies the Bakry-Émery criterion with constant c = η −1 . We then apply Theorem . with c = η −1 to obtain the polynomial moment bounds in Corollary . . Similarly, we apply Theorem . with c = η −1 to obtain the subgaussian concentration inequalities in Corollary . .

. E R
In this section, we give a high-level discussion about diffusion processes on Riemannian manifolds. The book [BGL ] contains a comprehensive treatment of the subject. For an introduction to calculus on Riemannian manifolds, references include [Pet , Lee ].
. . Measures on Riemannian manifolds. Let (M, g) be an n-dimensional Riemannian manifold whose co-metric tensor g(x) = (g ij (x) : 1 ≤ i, j ≤ n) is symmetric and positive definite for every x ∈ M. We write G(x) = (g ij : 1 ≤ i, j ≤ n) for the metric tensor, which satisfies the relation G(x) = g(x) −1 .
The Riemannian measure µ g on the manifold (M, g) has density dµ g ∝ g (x(z)) dz with respect to the Lebesgue measure in local coordinates. The weight g := det(g) −1/2 . Whenever this measure is finite, we normalize it to obtain a probability. In particular, a compact Riemannian manifold always admits a Riemannian probability measure.
The matrix Laplace-Beltrami operator ∆ g on the manifold is defined as Here, ∂ i and the like represent the components of the differential with respect to local coordinates. The diffusion process on M whose infinitesimal generator is ∆ g is called the Riemannian Brownian motion. The measure µ g is the stationary measure for the Brownian motion.
To generalize, one may consider a weighted measure dµ ∝ e −W dµ g where the potential W : M → R is sufficiently smooth. The associated infinitesimal generator is then the Laplace-Beltrami operator plus a drift term: It is not hard to check that L is symmetric with respect to µ, and hence the induced diffusion process with drift is reversible.
. . Carré du champ operators. Next, we present expressions for the matrix carré du champ operators associated with the infinitesimal generator L defined in ( . ). The derivation follows from a standard symbol calculation, as in the scalar setting.
. . . Carré du champ operator. The carré du champ operator coincides with the squared "magnitude" of the differential: where ·, · G is the inner product on T x M associated with the metric tensor G.
The expression ( . ) coincides with ( . ) if we choose ( i : 1 ≤ i ≤ n) to be the moving frame of N = n local coordinates. In this case, i (x), i (x) G = g ij (x) for i, j = 1, . . . , n. Moreover, the tangential gradient can be written as where ∇ i M f (x) := n j=1 g ij ∂ j f for i = 1, . . . , n. Then one can rewrite the expression ( . ) in the form ( . ) by recalling that G = g −1 .
The expression ( . ) is especially useful when the Riemannian manifold M is embedded into a higher-dimensional Euclidean space R N with the metric tensor G induced by the Euclidean metric. That is, M is a Riemannian submanifold of R N . In this case, for a function f : When the matrix dimension d > 1, the Hessian ∇ 2 f is a -tensor. Now, the iterated matrix carré du champ operator Γ 2 admits the formula Again, this expression involves matrix products. The Ricci tensor Ric = (Ric ij : 1 ≤ i, j ≤ n) is given by The Ricci tensor expresses the curvature of the manifold.
. . Bakry-Émery criterion. Since the first sum in the expression ( . ) for Γ 2 ( f ) is a positivesemidefinite matrix, we have the inequality In a Euclidean space, the Ricci tensor is everywhere zero, so the Bakry-Émery criterion ( . ) relies on the strong convexity of the potential W, as we have seen in Section . In contrast, on a Riemannian manifold, the Ricci tensor plays an important role.
Let us now assume that the Riemannian manifold is unweighted; that is, the potential W = 0 identically. By comparing the displays ( . ) and ( . ) for a scalar function f : M → R, we can see that the scalar Bakry-Émery criterion holds with constant c = ρ −1 , provided that g(x) Ric(x)g(x) ρg(x) or equivalently Ric(x) ρG(x) for all x ∈ M. That is, the eigenvalues of Ric relative to the metric G are bounded from below by ρ. This is often referred as the curvature condition C D(ρ, ∞). Proposition . allows us to lift the scalar Bakry-Émery criterion to matrix-valued functions; we can also achieve this goal by direct argument.
As a typical example, consider the n-dimensional unit sphere S n ⊂ R n+1 , equipped with the induced Riemmanian structure. The associated Riemannian measure is the uniform distribution. For the sphere, the Ricci curvature tensor is constant: Ric = (n − 1)G; see [BGL , Section . ]. Therefore, the Brownian motion on S n satisfies a Bakry-Émery criterion ( . ) with c = (n − 1) −1 for n ≥ 2.
Next, consider the special orthogonal group SO(n) ⊂ R n×n with the induced Riemannian structure.
The canonical measure is the Haar probability measure. For this manifold, it is known that the eigenvalues of the Ricci tensor are bounded below by ρ = (n − 1)/4; see [Led , p. ]. Therefore, the special orthogonal group SO(n) satisfies the Bakry-Émery criterion ( . ) with c = 4/(n − 1).
There are many other Riemannian manifolds where a lower bound on the Ricci curvature is available. We refer the reader to [Led , Sec. . . ] for more examples and references.
. . Calculations of carré du champ operators. In this section, we provide calculations of carré du champ operators for the concrete examples in Section . . . . . Example . : Sphere I. In this example, we consider the unit sphere S n ⊂ R n+1 as a Riemannian submanifold of R n+1 for n ≥ 2. The canonical Riemannian measure is the uniform probability measure σ n on the sphere.
Let (A 1 , . . . , A n+1 ) ⊂ H d be a fixed collection of Hermitian matrices. Draw a random vector x = (x 1 , . . . , x n+1 ) ∈ S n from the uniform measure; we use boldface to emphasize that x is a vector in the embedding space. Consider the matrix-valued function We can use the expression ( . ) to compute the carré du champ of f . Indeed, the ordinary gradient of f as a function on R n+1 is given by As usual, {e i } n+1 i=1 is the standard basis of R n+1 . Define the orthogonal projection Proj x = I − xx T onto the tangent space T x S n = {y ∈ R n+1 : y T x = 0}. Thus, the tangential gradient is the projection of the ordinary gradient onto the tangent space: By the expression ( . ), we can compute the carré du champ at each point x ∈ S n as This calculation verifies the formula ( . ). It is now evident that . . . Example . : Sphere II. We maintain the setup and notation from the last subsection, and we consider the matrix-valued function Treating f as a function on the embedding space R n+1 , the ordinary gradient is given by Thus, the tangential gradient of f at a point x ∈ S n can be computed as By the expression ( . ) of the carré du champ operator, we can compute that This establishes the formula ( .
Here is an alternative approach. For an arbitrary matrix B ∈ H d , we can write To study this random matrix model, we will use local geodesic/normal coordinates on the product manifold SO(d) ⊗n to compute the carré du champ; for example, see [Lee , Sec. ] & [Hal , Sec. ]. Since SO(d) ⊗n is a Lie group, we only need to consider the geodesic frame of the tangent space at the identity element (I d , . . . , I d ).
Then (V i kl : 1 ≤ i ≤ n and 1 ≤ k < l ≤ d) forms an orthonormal basis for the tangent space at the identity element of the Lie group SO(d) ⊗n , with respect to the Hilbert-Schmidt inner product: (P 1 , . . . , P n ), (Q 1 , . . . , Q n ) HS = n i=1 tr[P * i Q i ] for P 1 , . . . , P n , Q 1 , . . . , Q n ∈ M d .
In local geodesic coordinates, the co-metric tensor g at the origin equals the identity. Using the formula ( . ), we can compute the carré du champ as Therefore, we can obtain that This justifies the formula ( . ). Since each O i is an orthogonal matrix, the variance proxy satisfies Note that this bound is sharp because we can always choose some particular point (O 1 , . . . , O n ) to achieve equality.
. . Matrix concentration results. At last, we provide a proof of Theorem . from Theorem . and Theorem . .
Consider a compact n-dimensional Riemannian submanifold M of a Euclidean space. The uniform measure µ on M is the stationary measure of the associated Brownian motion on M. As discussed in Section . , the Brownian motion satisfies a Bakry-Émery criterion with constant c = ρ −1 if the eigenvalues of the Ricci curvature tensor are bounded below by ρ. We then apply Theorem . and Theorem . with c = ρ −1 to obtain the matrix concentration inequalities in Theorem . .
For any point x ∈ M, we can compute the carré du champ Γ( f )(x) in local normal coordinates centered at x. In this case, the co-metric tensor g is the identity matrix I n when evaluated at x. The expression of the variance proxy f in Theorem . then follows from formula ( . ) of the carré du champ operator.

A
A. M For reference, this appendix summarizes a few standard results on matrix moments and concentration. Proposition A. explains how to transfer the polynomial moments bounds in Theorem . into matrix concentration inequalities. Proposition A. states some properties of the trace mgf that are used in the proof of Theorem . . Proposition A. allows us to derive the exponential concentration inequalities in Theorem . from the exponential moment bounds in Theorem . .
As mentioned in Section , Proposition A. can be applied to the polynomial moment bounds in Theorem . to yield subgaussian concentration inequalities.
A. . The matrix Laplace transform method. We can also obtain exponential concentration inequalities via the matrix Laplace transform. Let X ∈ H d be a random matrix. The normalized trace moment generating function (mgf) of X is defined as m(θ) := Etr e θX , for θ ∈ R. This definition is due to Ahlswede and Winter [AW ]. In the proof of Theorem . , we have used some properties of the trace mgf given in the following proposition, which restates [PMT , Lemma . ].
Proposition A. (Properties of the trace mgf). Assume that X ∈ H d is a zero-mean random matrix that is bounded in norm. Define the normalized trace mgf m(θ) := Etr e θX for θ ∈ R. Then log m(θ) ≥ 0 and log m(0) = 0.
Using the matrix Laplace transform method, one can convert estimates on the trace mgf into bounds on the extreme eigenvalues of a random matrix. For example, see [MJC + , Proposition . ]. In particular, having an explicit bound on the trace mgf, we can obtain concrete estimates on the maximum eigenvalue. See [MJC + , Section . . ] for a proof. Proposition A. . Let X ∈ H d be a random matrix with normalized trace mgf m(θ) := Etr e θX . Assume that there are constants c 1 , c 2 ≥ 0 for which log m(θ) ≤ c 1 θ 2 2(1 − c 2 θ) when 0 ≤ θ < 1 c 2 .
We have applied Proposition A. to the trace mgf bounds in Theorem . to derive exponential concentration inequalities as those in Theorem . .

A B. M
In this section, we establish the mean value trace inequality, Lemma . . This result is a generalization of [PMT , Lemmas . and . ]. The proof is similar in spirit, but it uses some additional ingredients from matrix analysis.
The key idea is to use tensorization to lift a pair of noncommuting matrices to a pair of commuting tensors. This step gives us access to tools that are not available for general matrices. For any two Hermitian matrices X, Y ∈ H d , define a linear operator X ⊗ Y : M d → M d whose action is given by (X ⊗ Y)(Z) = XZY for all Z ∈ M d . The linear operator X ⊗ Y is self-adjoint with respect to the standard inner product on M d : (X ⊗ Y)(Z 1 ), Z 2 M d = tr Y Z * 1 XZ 2 = tr Z * 1 XZ 2 Y = Z 1 , (X ⊗ Y)(Z 2 ) M d for all Z 1 , Z 2 ∈ M d . Therefore, for any function ϕ : R → R, we can define the tensor function ϕ(X ⊗ Y) using the spectral resolution of X ⊗ Y. It is not hard to check that ϕ(X ⊗ I) = ϕ(X) ⊗ I and ϕ(I ⊗ Y) = I ⊗ ϕ(Y). Note that the tensors X ⊗ I and I ⊗ Y commute with each other, regardless of whether X and Y commute. Optimize over s > 0 to achieve the stated result.

A C. C S '
There is an established approach to proving matrix concentration inequalities using the method of exchangeable pairs; see [Cha ] for the scalar setting and [MJC + , PMT ] for matrix extensions. As mentioned in Section . , the approach in [PMT , Sections -] implicitly relies on a discrete version of the local ergodicity condition. A limiting version of this argument can also be used to derive the results in our paper. This appendix details the connection.
Given a reversible, exponentially ergodic Markov process (Z t ) t ≥0 with a stationary measure µ, one can construct an exchangeable pair as follows. Fix a time t > 0. Let Z be drawn from the measure µ, and letZ = Z t where Z 0 = Z. By reversibility, it is easy to check that (Z,Z) is an exchangeable pair; that is, (Z,Z) has the same distribution as (Z, Z).
For a zero-mean function f : Ω → H d , define the function g t : Ω → H d by Then ( f (Z), f (Z)) is a kernel Stein pair associated with the kernel K t (z,z) = g t (z) − g t (z) t for all z,z ∈ Ω.
By construction, for all z,z ∈ Ω, K t (z,z) = −K t (z, z); (C. ) E K t (Z,Z) | Z = z = f (z). which holds for any measurable function ϕ : H d → H d that satisfies the regularity condition K t (Z,Z) ϕ( f (Z)) < +∞ almost surely. Paulin et al. [PMT ] use (C. ) to establish matrix Efron-Stein inequalities, much in the same way that we derive Theorem . and Theorem . . The approach we undertake in this paper is not exactly parallel with the approach in Paulin et al. [PMT ]. Let us elaborate. Take the limit of g t as t ↓ 0, using L = lim t↓0 (P t − P 0 )/t. We get Indeed, by ergodicity, one can check that The identity (C. ) is just a discrete version of the formula (C. ). In contrast, the argument in this paper is based on the identity The integral is not in the same place! Our approach is technically a bit simpler because it does not require us to justify the convergence of the integral (C. ). Nevertheless, our work is strongly inspired by the tools and techniques developed by Paulin et al. [PMT ] in the discrete setting.

A
We thank Ramon van Handel for his feedback on an early version of this manuscript. He is responsible for the observation and proof that matrix Poincaré inequalities are equivalent with scalar Poincaré inequalities, and we are grateful to him for allowing us to incorporate these ideas.
DH was funded by NSF grants DMS-and DMS-. JAT gratefully acknowledges funding from ONR awards N --and N --, and he would like to thank his family for their support in these difficult times.