The Annals of Statistics

Posterior contraction in sparse Bayesian factor models for massive covariance matrices

Debdeep Pati, Anirban Bhattacharya, Natesh S. Pillai, and David Dunson

Full-text: Open access

Abstract

Sparse Bayesian factor models are routinely implemented for parsimonious dependence modeling and dimensionality reduction in high-dimensional applications. We provide theoretical understanding of such Bayesian procedures in terms of posterior convergence rates in inferring high-dimensional covariance matrices where the dimension can be larger than the sample size. Under relevant sparsity assumptions on the true covariance matrix, we show that commonly-used point mass mixture priors on the factor loadings lead to consistent estimation in the operator norm even when $p\gg n$. One of our major contributions is to develop a new class of continuous shrinkage priors and provide insights into their concentration around sparse vectors. Using such priors for the factor loadings, we obtain similar rate of convergence as obtained with point mass mixture priors. To obtain the convergence rates, we construct test functions to separate points in the space of high-dimensional covariance matrices using insights from random matrix theory; the tools developed may be of independent interest. We also derive minimax rates and show that the Bayesian posterior rates of convergence coincide with the minimax rates upto a $\sqrt{\log n}$ term.

Article information

Source
Ann. Statist., Volume 42, Number 3 (2014), 1102-1130.

Dates
First available in Project Euclid: 20 May 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1400592653

Digital Object Identifier
doi:10.1214/14-AOS1215

Mathematical Reviews number (MathSciNet)
MR3210997

Zentralblatt MATH identifier
1305.62124

Subjects
Primary: 62G05: Estimation 62G20: Asymptotic properties

Keywords
Bayesian estimation covariance matrix factor model rate of convergence shrinkage sparsity

Citation

Pati, Debdeep; Bhattacharya, Anirban; Pillai, Natesh S.; Dunson, David. Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Statist. 42 (2014), no. 3, 1102--1130. doi:10.1214/14-AOS1215. https://projecteuclid.org/euclid.aos/1400592653


Export citation

References

  • [1] Alzer, H. (1997). On some inequalities for the incomplete gamma function. Math. Comp. 66 771–778.
  • [2] Armagan, A., Dunson, D. and Lee, J. (2011). Generalized double Pareto shrinkage. Available at arXiv:1104.0861.
  • [3] Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71 135–171.
  • [4] Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191–221.
  • [5] Bartholomew, D. J. (1987). Latent Variable Models and Factor Analysis. Oxford Univ. Press, New York.
  • [6] Belitser, E. and Ghosal, S. (2003). Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution. Ann. Statist. 31 536–559.
  • [7] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • [8] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [9] Birgé, L. (1984). Sur un théorème de minimax et son application aux tests. Probab. Math. Statist. 3 259–282.
  • [10] Bontemps, D. (2011). Bernstein–von Mises theorems for Gaussian regression with increasing number of regressors. Ann. Statist. 39 2557–2584.
  • [11] Bunea, F. and Xiao, L. (2012). On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. Available at arXiv:1212.5321.
  • [12] Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106 672–684.
  • [13] Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
  • [14] Cai, T. T. and Zhou, H. H. (2012). Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist. 40 2389–2420.
  • [15] Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Amer. Statist. Assoc. 103 1438–1456.
  • [16] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465–480.
  • [17] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069–2101.
  • [18] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • [19] Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186–197.
  • [20] Fan, J., Liao, Y. and Mincheva, M. (2011). High-dimensional covariance matrix estimation in approximate factor models. Ann. Statist. 39 3320–3356.
  • [21] Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B Stat. Methodol. 75 603–680.
  • [22] Ghosal, S. (1999). Asymptotic normality of posterior distributions in high-dimensional linear models. Bernoulli 5 315–331.
  • [23] Ghosal, S. (2000). Asymptotic normality of posterior distributions for exponential families when the number of parameters tends to infinity. J. Multivariate Anal. 74 49–68.
  • [24] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • [25] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35 192–223.
  • [26] Giné, E. and Nickl, R. (2011). Rates on contraction for posterior distributions in $L^r$-metrics, $1\leq r\leq\infty$. Ann. Statist. 39 2883–2911.
  • [27] Hagerup, T. and Rüb, C. (1990). A guided tour of Chernoff bounds. Inform. Process. Lett. 33 305–308.
  • [28] Hans, C. (2011). Elastic net regression modeling with the orthant normal prior. J. Amer. Statist. Assoc. 106 1383–1393.
  • [29] Jiang, W. (2007). Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 1487–1511.
  • [30] Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
  • [31] Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. Ann. Statist. 40 694–726.
  • [32] Le Cam, L. (1986). Asymptotic methods in statistical decision theory. Springer, New York.
  • [33] Lucas, J. E., Carvalho, C., Wang, Q., Bild, A., Nevins, J. R. and West, M. (2006). Sparse statistical modelling in gene expression genomics. In Bayesian Inference for Gene Expression and Proteomics (K. A. Do, P. Müller and M. Vannucci, eds.) 155–176. Cambridge University Press, Cambridge.
  • [34] Mirsky, L. (1975). A trace inequality of John von Neumann. Monatsh. Math. 79 303–306.
  • [35] Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681–686.
  • [36] Polson, N. G. and Scott, J. G. (2010). Shrink globally, act locally: Sparse Bayesian regularization and prediction. In Bayesian Statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 501–538. Oxford Univ. Press, New York.
  • [37] Ray, K. (2013). Bayesian inverse problems with non-conjugate priors. Electron. J. Stat. 7 2516–2549.
  • [38] Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 2587–2619.
  • [39] Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.
  • [40] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing (Y. C. Eldar and G. Kutyniok, eds.) 210–268. Cambridge Univ. Press, Cambridge.
  • [41] West, M. (2003). Bayesian factor regression models in the “large $p$, small $n$” paradigm. In Bayesian Statistics, 7 (Tenerife, 2002) (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 733–742. Oxford Univ. Press, New York.
  • [42] Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G. L. Yang, eds.) 423–435. Springer, New York.
  • [43] Zhou, M. and Carin, L. (2012). Negative binomial process count and mixture modeling. Preprint. Available at arXiv:1209.3442.

Supplemental materials