Annals of Probability

Efron–Stein inequalities for random matrices

Daniel Paulin, Lester Mackey, and Joel A. Tropp

Full-text: Open access


This paper establishes new concentration inequalities for random matrices constructed from independent random variables. These results are analogous with the generalized Efron–Stein inequalities developed by Boucheron et al. The proofs rely on the method of exchangeable pairs.

Article information

Ann. Probab., Volume 44, Number 5 (2016), 3431-3473.

Received: August 2014
Revised: August 2015
First available in Project Euclid: 21 September 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60B20: Random matrices (probabilistic aspects; for algebraic aspects see 15B52) 60E15: Inequalities; stochastic orderings
Secondary: 60G09: Exchangeability 60F10: Large deviations

Concentration inequalities Stein’s method random matrix noncommutative exchangeable pairs coupling bounded differences Efron–Stein inequality trace inequality


Paulin, Daniel; Mackey, Lester; Tropp, Joel A. Efron–Stein inequalities for random matrices. Ann. Probab. 44 (2016), no. 5, 3431--3473. doi:10.1214/15-AOP1054.

Export citation


  • Adamczak, R., Litvak, A. E., Pajor, A. and Tomczak-Jaegermann, N. (2011). Sharp bounds on the rate of convergence of the empirical covariance matrix. C. R. Math. Acad. Sci. Paris 349 195–200.
  • Ahlswede, R. and Winter, A. (2002). Strong converse for identification via quantum channels. IEEE Trans. Inform. Theory 48 569–579.
  • Avron, H. and Toledo, S. (2014). Effective stiffness: Generalizing effective resistance sampling to finite element matrices. Available at arXiv:1110.4437.
  • Bhatia, R. (2007). Positive Definite Matrices. Princeton Univ. Press, Princeton, NJ.
  • Boucheron, S., Lugosi, G. and Massart, P. (2003). Concentration inequalities using the entropy method. Ann. Probab. 31 1583–1614.
  • Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford Univ. Press, Oxford.
  • Boucheron, S., Bousquet, O., Lugosi, G. and Massart, P. (2005). Moment inequalities for functions of independent random variables. Ann. Probab. 33 514–560.
  • Burda, Z., Jarosz, A., Nowak, M. A., Jurkiewicz, J., Papp, G. and Zahed, I. (2011). Applying free random variables to random matrix analysis of financial data. Part I: The Gaussian case. Quant. Finance 11 1103–1124.
  • Cai, T. T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. 39 1496–1525.
  • Carlen, E. (2010). Trace inequalities and quantum entropy: An introductory course. In Entropy and the Quantum. Contemp. Math. 529 73–140. Amer. Math. Soc., Providence, RI.
  • Chatterjee, S. (2005). Concentration inequalities with exchangeable pairs. PhD thesis, Stanford Univ., Palo Alto, February 2008. Available at arXiv:math/0507526.
  • Chatterjee, S. (2007). Stein’s method for concentration inequalities. Probab. Theory Related Fields 138 305–321.
  • Chen, R. Y., Gittens, A. and Tropp, J. A. (2012). The masked sample covariance estimator: An analysis via the matrix Laplace transform method. Information and Inference 1 2–20. DOI:10.1093/imaiai/ias001.
  • Chen, R. Y. and Tropp, J. A. (2014). Subadditivity of matrix $\phi$-entropy and concentration of random matrices. Electron. J. Probab. 19 no. 27, 30.
  • Collins, B., McDonald, D. and Saad, N. (2013). Compound wishart matrices and noisy covariance matrices: Risk underestimation. Available at arXiv:1306.5510.
  • Dalvi, N., Dasgupta, A., Kumar, R. and Rastogi, V. (2013). Aggregating crowdsourced binary ratings. In Proceedings of the 22nd International Conference on World Wide Web, WWW’13 285–294. Republic and Canton of Geneva, Switzerland.
  • Dobrushin, R. L. (1970). Prescribing a system of random variables by conditional distributions. Theory Probab. Appl. 15 458–486.
  • Dudley, R. M. (2002). Real Analysis and Probability. Cambridge Studies in Advanced Mathematics 74. Cambridge Univ. Press, Cambridge.
  • Koltchinskii, V. (2011). Von Neumann entropy penalization and low-rank matrix estimation. Ann. Statist. 39 2936–2973.
  • Levin, D. A., Peres, Y. and Wilmer, E. L. (2009). Markov Chains and Mixing Times. Amer. Math. Soc., Providence, RI.
  • Machart, P. and Ralaivola, L. (2012). Confusion matrix stability bounds for multiclass classification. Available at arXiv:1202.6221.
  • Mackey, L., Jordan, M. I., Chen, R. Y., Farrell, B. and Tropp, J. A. (2014). Matrix concentration inequalities via the method of exchangeable pairs. Ann. Probab. 42 906–945.
  • Morvant, E., Koço, S. and Ralaivola, L. (2012). Pac-Bayesian generalization bound on confusion matrix for multi-class classification. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) 815–822. International Machine Learning Society.
  • Netrapalli, P., Jain, P. and Sanghavi, S. (2013). Phase retrieval using alternating minimization. In Advances in Neural Information Processing Systems 2796–2804. Neural Information Processing Systems Foundation.
  • Oliveira, R. I. (2009). Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Available at arXiv:0911.0600.
  • Oliveira, R. I. (2010). Sums of random Hermitian matrices and an inequality by Rudelson. Electron. Commun. Probab. 15 203–212.
  • Paulin, D. (2012). A Note on matrix concentration inequalities via the method of exchangeable pairs. Available at arXiv:1212.2012.
  • Paulin, D., Mackey, L. and Tropp, J. A. (2013). Deriving matrix concentration inequalities from kernel couplings. Available at arXiv:1305.0612.
  • Petz, D. (1994). A survey of certain trace inequalities. In Functional Analysis and Operator Theory (Warsaw, 1992). Banach Center Publ. 30 287–298. Polish Acad. Sci., Warsaw.
  • Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_{1}$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
  • Robbins, H. (1955). A remark on Stirling’s formula. Amer. Math. Monthly 62 26–29.
  • Shao, Q.-M. and Zhou, W.-X. (2014). Necessary and sufficient conditions for the asymptotic distributions of coherence of ultra-high dimensional random matrices. Ann. Probab. 42 623–648.
  • Soloveychik, I. (2014). Error bound for compound Wishart matrices. Available at arXiv:1402.5581.
  • Speicher, R. (1998). Combinatorial theory of the free product with amalgamation and operator-valued free probability theory. Mem. Amer. Math. Soc. 132 x+88.
  • Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability Theory 583–602. Univ. California Press, Berkeley, CA.
  • Stein, C. (1986). Approximate Computation of Expectations. IMS, Hayward, CA.
  • Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.
  • Tropp, J. A. (2015). An introduction to matrix concentration inequalities. Found. Trends Mach. Learning 8. Available at arXiv:1501.01571.
  • Wigderson, A. and Xiao, D. (2008). Derandomizing the Ahlswede–Winter matrix-valued Chernoff bound using pessimistic estimators, and applications. Theory Comput. 4 53–76.
  • Zhou, E. and Hu, J. (2014). Gradient-based adaptive stochastic search for non-differentiable optimization. IEEE Trans. Automat. Control 59 1818–1832.