The Annals of Statistics

Identifying the number of factors from singular values of a large sample auto-covariance matrix

Zeng Li, Qinwen Wang, and Jianfeng Yao

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Identifying the number of factors in a high-dimensional factor model has attracted much attention in recent years and a general solution to the problem is still lacking. A promising ratio estimator based on singular values of lagged sample auto-covariance matrices has been recently proposed in the literature with a reasonably good performance under some specific assumption on the strength of the factors. Inspired by this ratio estimator and as a first main contribution, this paper proposes a complete theory of such sample singular values for both the factor part and the noise part under the large-dimensional scheme where the dimension and the sample size proportionally grow to infinity. In particular, we provide an exact description of the phase transition phenomenon that determines whether a factor is strong enough to be detected with the observed sample singular values. Based on these findings and as a second main contribution of the paper, we propose a new estimator of the number of factors which is strongly consistent for the detection of all significant factors (which are the only theoretically detectable ones). In particular, factors are assumed to have the minimum strength above the phase transition boundary which is of the order of a constant; they are thus not required to grow to infinity together with the dimension (as assumed in most of the existing papers on high-dimensional factor models). Empirical Monte-Carlo study as well as the analysis of stock returns data attest a very good performance of the proposed estimator. In all the tested cases, the new estimator largely outperforms the existing estimator using the same ratios of singular values.

Article information

Ann. Statist., Volume 45, Number 1 (2017), 257-288.

Received: June 2015
Revised: February 2016
First available in Project Euclid: 21 February 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62M10: Time series, auto-correlation, regression, etc. [See also 91B84] 62H25: Factor analysis and principal components; correspondence analysis
Secondary: 15B52: Random matrices

High-dimensional factor model high-dimensional time series large sample auto-covariance matrices spiked population model number of factors phase transition random matrices


Li, Zeng; Wang, Qinwen; Yao, Jianfeng. Identifying the number of factors from singular values of a large sample auto-covariance matrix. Ann. Statist. 45 (2017), no. 1, 257--288. doi:10.1214/16-AOS1452.

Export citation


  • Alessi, L., Barigozzi, M. and Capasso, M. (2010). Improved penalization for determining the number of factors in approximate factor models. Statist. Probab. Lett. 80 1806–1813.
  • Bai, J. and Li, K. (2012). Statistical analysis of factor models of high dimension. Ann. Statist. 40 436–465.
  • Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191–221.
  • Bai, J. and Ng, S. (2007). Determining the number of primitive shocks in factor models. J. Bus. Econom. Statist. 25 52–60.
  • Bai, Z. and Yao, J. (2008). Central limit theorems for eigenvalues in a spiked population model. Ann. Inst. Henri Poincaré Probab. Stat. 44 447–474.
  • Bai, Z. and Yao, J. (2012). On sample eigenvalues in a generalized spiked population model. J. Multivariate Anal. 106 167–177.
  • Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
  • Benaych-Georges, F., Guionnet, A. and Maida, M. (2011). Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices. Electron. J. Probab. 16 1621–1662.
  • Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227 494–521.
  • Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000). The generalized dynamic-factor model: Identification and estimation. Rev. Econ. Stat. 82 540–554.
  • Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2004). The generalized dynamic factor model consistency and rates. J. Econometrics 119 231–255.
  • Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2005). The generalized dynamic factor model: One-sided estimation and forecasting. J. Amer. Statist. Assoc. 100 830–840.
  • Geweke, J. (1977). The dynamic factor analysis of economic time series. In Latent Variables in Socio-Economic Models (D. J. Aigner and A. S. Goldberger, eds.). North-Holland, Amsterdam.
  • Hallin, M. and Liška, R. (2007). Determining the number of factors in the general dynamic factor model. J. Amer. Statist. Assoc. 102 603–617.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. Ann. Statist. 40 694–726.
  • Li, Z., Pan, G. and Yao, J. (2015). On singular value distribution of large-dimensional autocovariance matrices. J. Multivariate Anal. 137 119–140.
  • Li, Z., Wang, Q. W. and Yao, J. (2016). Supplement to “Identifying the number of factors from singular values of a large sample auto-covariance matrix.” DOI:10.1214/16-AOS1452SUPP.
  • Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Stat. 92 1004–1016.
  • Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly influential factors. J. Econometrics 168 244–258.
  • Onatski, A. (2015). Asymptotic analysis of the squared estimation error in misspecified factor models. J. Econometrics 186 388–406.
  • Passemier, D. and Yao, J.-F. (2012). On determining the number of spikes in a high-dimensional spiked population model. Random Matrices Theory Appl. 1 1150002, 19.
  • Sargent, T. J. and Sims, C. A. (1977). Business cycle modeling without pretending to have too much a priori economic theory. In New Methods in Business Cycle Research, Vol. 1 45–109. Federal Reserve Bank of Minneapolis, Minneapolis.
  • Stock, J. H. and Watson, M. W. (2011). Dynamic factor models. In The Oxford Handbook of Economic Forecasting 35–59. Oxford Univ. Press, Oxford.
  • Wang, Q. W. and Yao, J. (2016). Moment approach for singular values distribution of a large auto-covariance matrix. Ann. Inst. Henri Poincaré Probab. Stat. 52 1641–1666.

Supplemental materials

  • Supplement to “Identifying the number of factors from singular values of a large sample auto-covariance matrix”. A supplementary file [Li, Wang and Yao (2016)] collects several technical proofs used in the paper.