The Annals of Statistics

High-dimensional covariance matrix estimation in approximate factor models

Jianqing Fan, Yuan Liao, and Martina Mincheva

Full-text: Open access


The variance–covariance matrix plays a central role in the inferential theories of high-dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu [J. Amer. Statist. Assoc. 106 (2011) 672–684], taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied.

Article information

Ann. Statist., Volume 39, Number 6 (2011), 3320-3356.

First available in Project Euclid: 5 March 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H25: Factor analysis and principal components; correspondence analysis
Secondary: 62F12: Asymptotic properties of estimators 62H12: Estimation

Sparse estimation thresholding cross-sectional correlation common factors idiosyncratic seemingly unrelated regression


Fan, Jianqing; Liao, Yuan; Mincheva, Martina. High-dimensional covariance matrix estimation in approximate factor models. Ann. Statist. 39 (2011), no. 6, 3320--3356. doi:10.1214/11-AOS944.

Export citation


  • Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations. J. Amer. Statist. Assoc. 96 939–967.
  • Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71 135–171.
  • Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191–221.
  • Bickel, P. J. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Bickel, P. J. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106 672–684.
  • Cai, T. and Zhou, H. (2010). Optimal rates of convergence for sparse covariance matrix estimation. Unpublished manuscript, Dept. Statistics, The Wharton School, Univ. Pennsylvania, Philadelphia, PA.
  • Chamberlain, G. and Rothschild, M. (1983). Arbitrage, factor structure and mean-variance analysis in large asset markets. Econometrica 51 1305–1324.
  • Connor, G. and Korajczyk, R. (1993). A Test for the number of factors in an approximate factor model. J. Finance 48 1263–1291.
  • Fama, E. and French, K. (1992). The cross-section of expected stock returns. J. Finance 47 427–465.
  • Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186–197.
  • Fan, J., Zhang, J. and Yu, K. (2008). Asset allocation and risk assessment with gross exposure constraints for vast portfolios. Unpublished manuscript, Princeton Univ.
  • Gorman, M. (1981). Some Engel curves. In Essays in the Theory and Measurement of Consumer Behavior in Honor of Sir Richard Stone (A. Deaton, ed.). Cambridge Univ. Press, New York.
  • Harding, M. (2009). Structural estimation of high-dimensional factor models. Unpublished manuscript, Stanford Univ.
  • James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361–379. Univ. California Press, Berkeley, CA.
  • Kmenta, J. and Gilbert, R. (1970). Estimation of seemingly unrelated regressions with autoregressive disturbances. J. Amer. Statist. Assoc. 65 186–196.
  • Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
  • Lewbel, A. (1991). The rank of demand systems: Theory and nonparametric estimation. Econometrica 59 711–730.
  • Merlevède, F., Peligrad, M. and Rio, E. (2009). A Bernstein type inequality and moderate deviations for weakly dependent sequences. Unpublished manuscript, Univ. Paris Est.
  • Rothman, A. J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104 177–186.
  • Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J. Amer. Statist. Assoc. 57 348–368.