The Annals of Statistics

Statistical analysis of factor models of high dimension

Jushan Bai and Kunpeng Li

Full-text: Open access


This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We show that the distributions of the MLE estimators depend on the identification restrictions. Unlike the principal components approach, the maximum likelihood estimator explicitly allows heteroskedasticities, which are jointly estimated with other parameters. Efficiency of MLE relative to the principal components method is also considered.

Article information

Ann. Statist., Volume 40, Number 1 (2012), 436-465.

First available in Project Euclid: 16 April 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H25: Factor analysis and principal components; correspondence analysis
Secondary: 62F12: Asymptotic properties of estimators

High-dimensional factor models maximum likelihood estimation factors factor loadings idiosyncratic variances principal components


Bai, Jushan; Li, Kunpeng. Statistical analysis of factor models of high dimension. Ann. Statist. 40 (2012), no. 1, 436--465. doi:10.1214/11-AOS966.

Export citation


  • [1] Amemiya, Y., Fuller, W. A. and Pantula, S. G. (1987). The asymptotic distributions of some estimators for a factor analysis model. J. Multivariate Anal. 22 51–64.
  • [2] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, Hoboken, NJ.
  • [3] Anderson, T. W. and Amemiya, Y. (1988). The asymptotic normal distribution of estimators in factor analysis under general conditions. Ann. Statist. 16 759–771.
  • [4] Anderson, T. W. and Rubin, H. (1956). Statistical inference in factor analysis. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 19541955, Vol. V 111–150. Univ. California Press, Berkeley.
  • [5] Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica 71 135–171.
  • [6] Bai, J. and Li, K. (2012). Supplement to “Statistical analysis of factor models of high dimension.” DOI:10.1214/11-AOS966SUPP.
  • [7] Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191–221.
  • [8] Bai, J. and Ng, S. (2010). Principal components estimation and identification of the factors. Unpublished manuscript, Columbia Univ.
  • [9] Breitung, J. and Tenhofen, J. (2008). GLS estimation of dynamic factor models. Working paper, Univ. Bonn.
  • [10] Campbell, J. Y., Lo, A. W. and MacKinlay, A. C. (1997). The Econometrics of Financial Markets. Princeton Univ. Press, Princeton, NJ.
  • [11] Chamberlain, G. and Rothschild, M. (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51 1281–1304.
  • [12] Choi, I. (2007). Efficient estimation of factor models. Working paper. Available at
  • [13] Connor, G. and Korajczyk, R. A. (1988). Risk and return in an equilibrium APT: Application of a new test methodology. Journal of Financial Economics 21 255–289.
  • [14] Doz, C., Giannone, D. and Reichlin, L. (2006). A quasi-maximum likelihood approach for large approximate dynamic factor models. Discussion Paper 5724, CEPR.
  • [15] Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000). The generalized dynamic-factor model: Identification and estimation. Rev. Econom. Statist. 82 540–554.
  • [16] Geweke, J. and Zhou, G. (1996). Measuring the price of the arbitrage pricing theory. The Review of Financial Studies 9 557–587.
  • [17] Goyal, A., Perignon, C. and Villa, C. (2008). How common are common return factors across the NYSE and Nasdaq? Journal of Financial Economics 90 252–271.
  • [18] Lawley, D. N. and Maxwell, A. E. (1971). Factor Analysis as a Statistical Method, 2nd ed. Elsevier, New York.
  • [19] Newey, W. K. and McFadden, D. (1994). Large sample estimation and hypothesis testing. In Handbook of Econometrics, Vol. IV (R. F. Engle and D. McFadden, eds.). Handbooks in Economics 2 2111–2245. North-Holland, Amsterdam.
  • [20] Ross, S. A. (1976). The arbitrage theory of capital asset pricing. J. Econom. Theory 13 341–360.
  • [21] Rubin, D. B. and Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika 47 69–76.
  • [22] Stock, J. H. and Watson, M. W. (2002). Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97 1167–1179.
  • [23] Stock, J. H. and Watson, M. W. (2002). Macroeconomic forecasting using diffusion indexes. J. Bus. Econom. Statist. 20 147–162.
  • [24] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
  • [25] Wu, C. F. J. (1983). On the convergence properties of the EM algorithm. Ann. Statist. 11 95–103.

Supplemental materials

  • Supplementary material: Supplement to “Statistical analysis of factor models of high dimension”. In this supplement we provide the detailed proofs for Theorems 5.1–5.4 and 6.1. We also give a simple and direct proof that the EM solutions satisfy the first order conditions. Remarks are given on how to make use of matrix properties to write a faster computer program.