Annals of Probability

Matrix concentration inequalities via the method of exchangeable pairs

Lester Mackey, Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp

Full-text: Open access


This paper derives exponential concentration inequalities and polynomial moment inequalities for the spectral norm of a random matrix. The analysis requires a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein’s method of exchangeable pairs. When applied to a sum of independent random matrices, this approach yields matrix generalizations of the classical inequalities due to Hoeffding, Bernstein, Khintchine and Rosenthal. The same technique delivers bounds for sums of dependent random matrices and more general matrix-valued functions of dependent random variables.

Article information

Ann. Probab., Volume 42, Number 3 (2014), 906-945.

First available in Project Euclid: 26 March 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60B20: Random matrices (probabilistic aspects; for algebraic aspects see 15B52) 60E15: Inequalities; stochastic orderings
Secondary: 60G09: Exchangeability 60F10: Large deviations

Concentration inequalities moment inequalities Stein’s method exchangeable pairs random matrix noncommutative


Mackey, Lester; Jordan, Michael I.; Chen, Richard Y.; Farrell, Brendan; Tropp, Joel A. Matrix concentration inequalities via the method of exchangeable pairs. Ann. Probab. 42 (2014), no. 3, 906--945. doi:10.1214/13-AOP892.

Export citation


  • [1] Ahlswede, R. and Winter, A. (2002). Strong converse for identification via quantum channels. IEEE Trans. Inform. Theory 48 569–579.
  • [2] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
  • [3] Buchholz, A. (2001). Operator Khintchine inequality in noncommutative probability. Math. Ann. 319 1–16.
  • [4] Burkholder, D. L. (1973). Distribution function inequalities for martingales. Ann. Probab. 1 19–42.
  • [5] Carlen, E. (2010). Trace inequalities and quantum entropy: An introductory course. In Entropy and the Quantum. Contemp. Math. 529 73–140. Amer. Math. Soc., Providence, RI.
  • [6] Chatterjee, S. (2007). Stein’s method for concentration inequalities. Probab. Theory Related Fields 138 305–321.
  • [7] Chatterjee, S. (2008). Concentration inequalities with exchangeable pairs. Ph.D. thesis, Stanford Univ., Palo Alto.
  • [8] Chen, R. Y., Gittens, A. and Tropp, J. A. (2012). The masked sample covariance estimator: An analysis using matrix concentration inequalities. Information and Inference 1 2–20.
  • [9] Cheung, S. S., So, A. M.-C. and Wang, K. (2011). Chance-constrained linear matrix inequalities with dependent perturbations: A safe tractable approximation approach. Available at
  • [10] Chiu, J. and Demanet, L. (2011). Sublinear randomized algorithms for skeleton decomposition. Available at arXiv:1110.4193.
  • [11] Chiu, J. and Demanet, L. (2012). Matrix probing and its conditioning. SIAM J. Numer. Anal. 50 171–193.
  • [12] Cohen, A., Davenport, M. and Leviatan, D. (2011). On the stability and accuracy of least-squares approximation. Available at arXiv:1111.4422.
  • [13] Foygel, R. and Srebro, N. (2011). Concentration-based guarantees for low-rank matrix reconstruction. J. Mach. Learn. Res. 19 315–340.
  • [14] Gittens, A. (2011). The spectral norm error of the naïve Nyström extension. Available at arXiv:1110.5305.
  • [15] Gittens, A. and Tropp, J. A. (2011). Tail bounds for all eigenvalues of a sum of random matrices. Available at arXiv:1104.4513.
  • [16] Gross, D. (2011). Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform. Theory 57 1548–1566.
  • [17] Higham, N. J. (2008). Functions of Matrices: Theory and Computation. SIAM, Philadelphia, PA.
  • [18] Hoeffding, W. (1951). A combinatorial central limit theorem. Ann. Math. Statistics 22 558–566.
  • [19] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
  • [20] Hsu, D., Kakade, S. M. and Zhang, T. (2012). Tail inequalities for sums of random matrices that depend on the intrinsic dimension. Electron. Commun. Probab. 17 13.
  • [21] Junge, M. and Xu, Q. (2003). Noncommutative Burkholder/Rosenthal inequalities. Ann. Probab. 31 948–995.
  • [22] Junge, M. and Xu, Q. (2008). Noncommutative Burkholder/Rosenthal inequalities. II. Applications. Israel J. Math. 167 227–282.
  • [23] Junge, M. and Zheng, Q. (2011). Noncommutative Bennett and Rosenthal inequalities. Available at arXiv:1111.1027.
  • [24] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Springer, Heidelberg.
  • [25] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
  • [26] Lieb, E. H. (1973). Convex trace functions and the Wigner–Yanase–Dyson conjecture. Adv. Math. 11 267–288.
  • [27] Lugosi, G. (2009). Concentration-of-measure inequalities. Available at
  • [28] Lust-Piquard, F. (1986). Inégalités de Khintchine dans $C_{p}$ ($1<p<\infty$). C. R. Acad. Sci. Paris Sér. I Math. 303 289–292.
  • [29] Lust-Piquard, F. and Pisier, G. (1991). Noncommutative Khintchine and Paley inequalities. Ark. Mat. 29 241–260.
  • [30] Mackey, L., Talwalkar, A. and Jordan, M. I. (2011). Divide-and-conquer matrix factorization. In Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N. Pereira and K. Q. Weinberger, eds.) 1134–1142.
  • [31] McDiarmid, C. (1989). On the method of bounded differences. In Surveys in Combinatorics, 1989 (Norwich, 1989). London Mathematical Society Lecture Note Series 141 148–188. Cambridge Univ. Press, Cambridge.
  • [32] Minsker, S. (2011). Some extensions of Bernstein’s inequality for self-adjoint operators. Available at arXiv:1112.5448.
  • [33] Nagaev, S. V. and Pinelis, I. F. (1977). Some inequalities for the distributions of sums of independent random variables. Theory Probab. Appl. 22 248–256.
  • [34] Negahban, S. and Wainwright, M. J. (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. J. Mach. Learn. Res. 13 1665–1697.
  • [35] Oliveira, R. I. (2009). Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Available at arXiv:0911.0600.
  • [36] Oliveira, R. I. (2010). Sums of random Hermitian matrices and an inequality by Rudelson. Electron. Commun. Probab. 15 203–212.
  • [37] Paulsen, V. (2002). Completely Bounded Maps and Operator Algebras. Cambridge Studies in Advanced Mathematics 78. Cambridge Univ. Press, Cambridge.
  • [38] Petz, D. (1994). A survey of certain trace inequalities. In Functional Analysis and Operator Theory (Warsaw, 1992). Banach Center Publ. 30 287–298. Polish Acad. Sci., Warsaw.
  • [39] Pinelis, I. (1994). Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22 1679–1706.
  • [40] Pisier, G. and Xu, Q. (1997). Noncommutative martingale inequalities. Comm. Math. Phys. 189 667–698.
  • [41] Rauhut, H. (2010). Compressive sensing and structured random matrices. In Theoretical Foundations and Numerical Methods for Sparse Recovery. Radon Ser. Comput. Appl. Math. 9 1–92. de Gruyter, Berlin.
  • [42] Recht, B. (2011). A simpler approach to matrix completion. J. Mach. Learn. Res. 12 3413–3430.
  • [43] Rosenthal, H. P. (1970). On the subspaces of $L_{p}$ ($p>2$) spanned by sequences of independent random variables. Israel J. Math. 8 273–303.
  • [44] Rudelson, M. (1999). Random vectors in the isotropic position. J. Funct. Anal. 164 60–72.
  • [45] Rudelson, M. and Vershynin, R. (2007). Sampling from large matrices: An approach through geometric functional analysis. J. ACM 54 Art. 21, 19 pp. (electronic).
  • [46] So, A. M.-C. (2011). Moment inequalities for sums of random matrices and their applications in optimization. Math. Program. 130 125–151.
  • [47] Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability Theory 583–602. Univ. California Press, Berkeley, CA.
  • [48] Tropp, J. A. (2011). Freedman’s inequality for matrix martingales. Electron. Commun. Probab. 16 262–270.
  • [49] Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.
  • [50] Wigderson, A. and Xiao, D. (2008). Derandomizing the Ahlswede–Winter matrix-valued Chernoff bound using pessimistic estimators, and applications. Theory Comput. 4 53–76.