Statistical Science

Bi-Cross-Validation for Factor Analysis

Art B. Owen and Jingshu Wang

Full-text: Open access


Factor analysis is over a century old, but it is still problematic to choose the number of factors for a given data set. We provide a systematic review of current methods and then introduce a method based on bi-cross-validation, using randomly held-out submatrices of the data to choose the optimal number of factors. We find it performs better than many existing methods especially when both the number of variables and the sample size are large and some of the factors are relatively weak. Our performance criterion is based on recovery of an underlying signal, equal to the product of the usual factor and loading matrices. Like previous comparisons, our work is simulation based. Recent advances in random matrix theory provide principled choices for the number of factors when the noise is homoscedastic, but not for the heteroscedastic case. The simulations we chose are designed using guidance from random matrix theory. In particular, we include factors which are asymptotically too small to detect, factors large enough to detect but not large enough to improve the estimate, and two classes of factors (weak and strong) large enough to be useful. We also find that a form of early stopping regularization improves the recovery of the signal matrix.

Article information

Statist. Sci., Volume 31, Number 1 (2016), 119-139.

First available in Project Euclid: 10 February 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Parallel analysis random matrix theory scree plot unwanted variation


Owen, Art B.; Wang, Jingshu. Bi-Cross-Validation for Factor Analysis. Statist. Sci. 31 (2016), no. 1, 119--139. doi:10.1214/15-STS539.

Export citation


  • [1] Ahn, S. C. and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors. Econometrica 81 1203–1227.
  • [2] Alessi, L., Barigozzi, M. and Capasso, M. (2010). Improved penalization for determining the number of factors in approximate factor models. Statist. Probab. Lett. 80 1806–1813.
  • [3] Amengual, D. and Watson, M. W. (2007). Consistent estimation of the number of dynamic factors in a large $N$ and $T$ panel. J. Bus. Econom. Statist. 25 91–96.
  • [4] Bai, J. and Li, K. (2012). Statistical analysis of factor models of high dimension. Ann. Statist. 40 436–465.
  • [5] Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191–221.
  • [6] Bai, J. and Ng, S. (2008). Large Dimensional Factor Analysis. Now Publishers, Hanover.
  • [7] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
  • [8] Bartlett, M. S. (1954). A note on the multipying factors for various $\chi^{2}$ approximations. J. Roy. Statist. Soc. Ser. B. 16 296–298.
  • [9] Benaych-Georges, F. and Nadakuditi, R. R. (2012). The singular values and vectors of low rank perturbations of large rectangular random matrices. J. Multivariate Anal. 111 120–135.
  • [10] Buja, A. and Eyuboglu, N. (1992). Remarks on parallel analysis. Multivar. Behav. Res. 27 509–540.
  • [11] Caruana, R., Lawrence, S. and Giles, L. (2001). Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference 13 402–408. MIT Press, Cambridge, MA.
  • [12] Cattell, R. B. (1966). The scree test for the number of factors. Multivar. Behav. Res. 1 245–276.
  • [13] Cattell, R. B. and Vogelmann, S. (1977). A comprehensive trial of the scree and KG criteria for determining the number of factors. Multivar. Behav. Res. 12 289–325.
  • [14] Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2012). Latent variable graphical model selection via convex optimization. Ann. Statist. 40 1935–1967.
  • [15] Choi, Y., Taylor, J. and Tibshirani, R. (2014). Selecting the number of principal components: Estimation of the true rank of a noisy matrix. Preprint. Available at arXiv:1410.8260.
  • [16] Easley, D. and Kleinberg, J. (2010). Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge Univ. Press, Cambridge.
  • [17] Fishler, E., Grosmann, M. and Messer, H. (2002). Detection of signals by information theoretic criteria: General asymptotic performance analysis. IEEE Trans. Signal Process. 50 1027–1036.
  • [18] Fleming, H. E. (1990). Equivalence of regularization and truncated iteration in the solution of ill-posed image reconstruction problems. Linear Algebra Appl. 130 133–150.
  • [19] Forni, M., Hallin, M., Lippi, M. and Reichlin, L. (2000). The generalized dynamic-factor model: Identification and estimation. Rev. Econom. Statist. 82 540–554.
  • [20] Forni, M. and Reichlin, L. (2001). Federal policies and local economies: Europe and the US. Eur. Econ. Rev. 45 109–134.
  • [21] Gagnon-Bartsch, J. A. and Speed, T. P. (2012). Using control genes to correct for unwanted variation in microarray data. Biostatistics 13 539–552.
  • [22] Gavish, M. and Donoho, D. L. (2014). The optimal hard threshold for singular values is $4/\sqrt{3}$. IEEE Trans. Inform. Theory 60 5040–5053.
  • [23] Hallin, M. and Liška, R. (2007). Determining the number of factors in the general dynamic factor model. J. Amer. Statist. Assoc. 102 603–617.
  • [24] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed. Springer, New York.
  • [25] Hermus, K., Wambacq, P. and Van hamme, H. (2007). A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J. Adv. Signal Process. Art. ID 45821, 15.
  • [26] Hochreiter, S., Clevert, D.-A. and Obermayer, K. (2006). A new summarization method for Affymetrix probe level data. Bioinformatics 22 943–949.
  • [27] Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika 30 179–185.
  • [28] Hubbard, R. and Allen, S. J. (1987). An empirical comparison of alternative methods for principal component extraction. J. Bus. Res. 15 173–190.
  • [29] Jolliffe, I. T. (1986). Principal Component Analysis. Springer, New York.
  • [30] Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educ. Psychol. Meas. 20 141-151.
  • [31] Kapetanios, G. (2004). A new method for determining the number of factors in factor models with large datasets. Technical report, Working Paper, Dept. of Economics, Queen Mary University of London.
  • [32] Kapetanios, G. (2010). A testing procedure for determining the number of factors in approximate factor models with large datasets. J. Bus. Econom. Statist. 28 397–409.
  • [33] Kritchman, S. and Nadler, B. (2008). Determining the number of components in a factor model from limited noisy data. Chemom. Intell. Lab. Syst. 94 19–32.
  • [34] Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. Ann. Statist. 40 694–726.
  • [35] Lan, W. and Du, L. (2014). A factor-adjusted multiple testing procedure with application to mutual fund selection. Preprint. Available at arXiv:1407.5515.
  • [36] Lawley, D. N. (1956). Tests of significance for the latent roots of covariance and correlation matrices. Biometrika 43 128–136.
  • [37] Leek, J. T. and Storey, J. D. (2008). A general framework for multiple testing dependence. Proc. Natl. Acad. Sci. USA 105 18718–18723.
  • [38] Love, D., Hallbauer, D., Amos, A. and Hranova, R. (2004). Factor analysis as a tool in groundwater quality management: Two southern African case studies. Physics and Chemistry of the Earth, Parts A/B/C 29 1135–1143.
  • [39] Nadakuditi, R. R. (2014). OptShrink: An algorithm for improved low-rank signal matrix denoising by optimal, data-driven singular value shrinkage. IEEE Trans. Inform. Theory 60 3002–3018.
  • [40] Nadakuditi, R. R. and Edelman, A. (2008). Sample eigenvalue based detection of high-dimensional signals in white noise using relatively few samples. IEEE Trans. Signal Process. 56 2625–2638.
  • [41] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817.
  • [42] Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Stat. 92 1004–1016.
  • [43] Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly influential factors. J. Econometrics 168 244–258.
  • [44] Onatski, A. (2015). Asymptotic analysis of the squared estimation error in misspecified factor models. J. Econometrics 186 388–406.
  • [45] Owen, A. B. and Perry, P. O. (2009). Bi-cross-validation of the SVD and the nonnegative matrix factorization. Ann. Appl. Stat. 3 564–594.
  • [46] Owen, A. B. and Wang, J. (2015). Bi-cross-validation for factor analysis (v1). Preprint. Available at arXiv:1503.03515.
  • [47] Paque, J. M., Browning, R., King, P. L. and Pianetta, P. (1990). Quantitative information from $x$-ray images of geological materials. In Proceedings of the XIIth International Congress for Electron Microscopy 2 244–247. San Francisco Press, San Francisco, CA.
  • [48] Patterson, N., Price, A. L. and Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics 2 e190.
  • [49] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • [50] Peres-Neto, P. R., Jackson, D. A. and Somers, K. M. (2005). How many principal components? Stopping rules for determining the number of non-trivial axes revisited. Comput. Statist. Data Anal. 49 974–997.
  • [51] Perry, P. O. (2009). Cross-validation for unsupervised learning. Preprint. Available at arXiv:0909.3052.
  • [52] Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38 904–909.
  • [53] Rosset, S., Zhu, J. and Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res. 5 941–973.
  • [54] Roy, S. N. (1953). On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Stat. 24 220–238.
  • [55] Spearman, C. (1904). “General intelligence,” objectively determined and measured. Am. J. Psychol. 15 201–292.
  • [56] Sun, Y., Zhang, N. R. and Owen, A. B. (2012). Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data. Ann. Appl. Stat. 6 1664–1688.
  • [57] Velicer, W. F. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika 41 321–327.
  • [58] Velicer, W. F., Eaton, C. A. and Fava, J. L. (2000). Construct explication through factor or component analysis: A review and evaluation of alternative procedures for determining the number of factors or components. In Problems and Solutions in Human Assessment 41–71. Springer, Berlin.
  • [59] Wax, M. and Kailath, T. (1985). Detection of signals by information theoretic criteria. IEEE Trans. Acoust. Speech Signal Process. 33 387–392.
  • [60] Yao, Y., Rosasco, L. and Caponnetto, A. (2007). On early stopping in gradient descent learning. Constr. Approx. 26 289–315.
  • [61] Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538–1579.
  • [62] Zwick, W. R. and Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychol. Bull. 99 432–442.