Bayesian Analysis

Optimal Bayesian Minimax Rates for Unconstrained Large Covariance Matrices

Kyoungjae Lee and Jaeyong Lee

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access


We obtain the optimal Bayesian minimax rate for the unconstrained large covariance matrix of multivariate normal sample with mean zero, when both the sample size, n, and the dimension, p, of the covariance matrix tend to infinity. Traditionally the posterior convergence rate is used to compare the frequentist asymptotic performance of priors, but defining the optimality with it is elusive. We propose a new decision theoretic framework for prior selection and define Bayesian minimax rate. Under the proposed framework, we obtain the optimal Bayesian minimax rate for the spectral norm for all rates of p. We also considered Frobenius norm, Bregman divergence and squared log-determinant loss and obtain the optimal Bayesian minimax rate under certain rate conditions on p. A simulation study is conducted to support the theoretical results.

Article information

Bayesian Anal., Advance publication (2018), 19 pages.

First available in Project Euclid: 23 February 2018

Permanent link to this document

Digital Object Identifier

Primary: 62C10: Bayesian problems; characterization of Bayes procedures 62C20: Minimax procedures
Secondary: 62F15: Bayesian inference

Bayesian minimax rate convergence rate decision theoretic prior selection unconstrained covariance

Creative Commons Attribution 4.0 International License.


Lee, Kyoungjae; Lee, Jaeyong. Optimal Bayesian Minimax Rates for Unconstrained Large Covariance Matrices. Bayesian Anal., advance publication, 23 February 2018. doi:10.1214/18-BA1094.

Export citation


  • Ahmed, N. A. and Gokhale, D. (1989). “Entropy expressions and their estimators for multivariate distributions.” IEEE Transactions on Information Theory, 35(3): 688–692.
  • Anderson, T. (2003). An Introduction to Multivariate Statistical Analysis. Wiley Series in Probability and Statistics. Wiley. URL
  • Banerjee, S. and Ghosal, S. (2014). “Posterior convergence rates for estimating large precision matrices using graphical models.” Electronic Journal of Statistics, 8(2): 2111–2137.
  • Beirlant, J., Dudewicz, E. J., Györfi, L., and van der Meulen, E. C. (1997). “Nonparametric entropy estimation: An overview.” International Journal of Mathematical and Statistical Sciences, 6(1): 17–39.
  • Bickel, P. J. and Levina, E. (2008b). “Regularized estimation of large covariance matrices.” The Annals of Statistics, 36(1): 199–227.
  • Bregman, L. M. (1967). “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming.” USSR Computational Mathematics and Mathematical Physics, 7(3): 200–217.
  • Cai, T. T., Liang, T., and Zhou, H. H. (2015). “Law of log determinant of sample covariance matrix and optimal estimation of differential entropy for high-dimensional Gaussian distributions.” Journal of Multivariate Analysis, 137: 161–172.
  • Cai, T. T., Ren, Z., and Zhou, H. H. (2016). “Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation.” Electronic Journal of Statistics, 10(1): 1–59.
  • Cai, T. T., Zhang, C.-H., and Zhou, H. H. (2010). “Optimal rates of convergence for covariance matrix estimation.” The Annals of Statistics, 38(4): 2118–2144.
  • Cai, T. T. and Zhou, H. H. (2012a). “Minimax estimation of large covariance matrices under l1 norm.” Statistica Sinica, 22(4): 1319–1378.
  • Cai, T. T. and Zhou, H. H. (2012b). “Optimal rates of convergence for sparse covariance matrix estimation.” The Annals of Statistics, 40(5): 2389–2420.
  • Castillo, I. (2014). “On Bayesian supremum norm contraction rates.” The Annals of Statistics, 42(5): 2058–2091.
  • Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. New York, NY, USA: Wiley-Interscience.
  • Dhillon, I. S. and Tropp, J. A. (2007). “Matrix nearness problems with Bregman divergences.” SIAM Journal on Matrix Analysis and Applications, 29(4): 1120–1146.
  • Dudewicz, E. J. and Mommaerts, W. (1991). “Maximum entropy methods in modern spectroscopy: a review and an empiric entropy approach.” In conference proceedings on The frontiers of statistical scientific theory & industrial applications (Vol. II), 115–160. American Sciences Press.
  • Gao, C. and Zhou, H. H. (2015). “Rate-optimal posterior contraction for sparse PCA.” The Annals of Statistics, 43(2): 785–818.
  • Gao, C. and Zhou, H. H. (2016). “Bernstein-von Mises theorems for functionals of the covariance matrix.” Electronic Journal of Statistics, 10(2): 1751–1806.
  • Geisser, S. and Cornfield, J. (1963). “Posterior distributions for multivariate normal parameters.” Journal of the Royal Statistical Society: Series B, 25: 368–376.
  • Ghosal, S. and van der Vaart, A. (2017). Fundamentals of Nonparametric Bayesian Inference. Cambridge University Press.
  • Gupta, M. and Srivastava, S. (2010). “Parametric Bayesian estimation of differential entropy and relative entropy.” Entropy, 12(4): 818–843.
  • Hjort, N., Holmes, C., Müller, P., and Walker, S. (2010). Bayesian Nonparametrics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. URL
  • Hoffmann, M., Rousseau, J., and Schmidt-Hieber, J. (2015). “On adaptive posterior concentration rates.” The Annals of Statistics, 43(5): 2259–2295.
  • Hyvärinen, A. (1998). “New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit.” In Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems 10, NIPS ’97, 273–279. Cambridge, MA, USA: MIT Press. URL
  • Jeffreys, H. (1961). Theory of Probability. Oxford, England: Oxford, third edition.
  • Johnstone, I. M. and Lu, A. Y. (2009). “On consistency and sparsity for principal components analysis in high dimensions.” Journal of the American Statistical Association, 104(486): 682–693.
  • Kulis, B., Sustik, M. A., and Dhillon, I. S. (2009). “Low-rank kernel learning with Bregman matrix divergences.” Journal of Machine Learning Research, 10: 341–376.
  • Lee, K. and Lee, J. (2017). “Supplementary material for “Optimal Bayesian minimax rates for unconstrained large covariance matrices”.” Bayesian Analysis.
  • Pati, D., Bhattacharya, A., Pillai, N. S., and Dunson, D. (2014). “Posterior contraction in sparse Bayesian factor models for massive covariance matrices.” The Annals of Statistics, 42(3): 1102–1130.
  • Rocková, V. (2017). “Bayesian estimation of sparse signals with a continuous spike-and-slab prior.” The Annals of Statistics, 1–34. To appear.
  • Shen, W. and Ghosal, S. (2015). “Adaptive Bayesian procedures using random series priors.” Scandinavian Journal of Statistics, 42(4): 1194–1213.
  • Srivastava, S. and Gupta, M. R. (2008). “Bayesian estimation of the entropy of the multivariate Gaussian.” In 2008 IEEE International Symposium on Information Theory, 1103–1107. IEEE.
  • Sun, D. and Berger, J. O. (2007). “Objective Bayesian analysis for the multivariate normal model.” Bayesian Statistics, 8: 525–547.
  • Uhlig, H. (1994). “On singular Wishart and singular multivariate beta distributions.” The Annals of Statistics, 22(1): 395–405.
  • Verzelen, N. (2010). “Adaptive estimation of covariance matrices via cholesky decomposition.” Electronic Journal of Statistics, 4: 1113–1150.
  • Xue, L. and Zou, H. (2013). “Minimax optimal estimation of general bandable covariance matrices.” Journal of Multivariate Analysis, 116: 45–51.

Supplemental materials