The Annals of Applied Statistics

Separable factor analysis with applications to mortality data

Bailey K. Fosdick and Peter D. Hoff

Full-text: Open access


Human mortality data sets can be expressed as multiway data arrays, the dimensions of which correspond to categories by which mortality rates are reported, such as age, sex, country and year. Regression models for such data typically assume an independent error distribution or an error model that allows for dependence along at most one or two dimensions of the data array. However, failing to account for other dependencies can lead to inefficient estimates of regression parameters, inaccurate standard errors and poor predictions. An alternative to assuming independent errors is to allow for dependence along each dimension of the array using a separable covariance model. However, the number of parameters in this model increases rapidly with the dimensions of the array and, for many arrays, maximum likelihood estimates of the covariance parameters do not exist. In this paper, we propose a submodel of the separable covariance model that estimates the covariance matrix for each dimension as having factor analytic structure. This model can be viewed as an extension of factor analysis to array-valued data, as it uses a factor model to estimate the covariance along each dimension of the array. We discuss properties of this model as they relate to ordinary factor analysis, describe maximum likelihood and Bayesian estimation methods, and provide a likelihood ratio testing procedure for selecting the factor model ranks. We apply this methodology to the analysis of data from the Human Mortality Database, and show in a cross-validation experiment how it outperforms simpler methods. Additionally, we use this model to impute mortality rates for countries that have no mortality data for several years. Unlike other approaches, our methodology is able to estimate similarities between the mortality rates of countries, time periods and sexes, and use this information to assist with the imputations.

Article information

Ann. Appl. Stat., Volume 8, Number 1 (2014), 120-147.

First available in Project Euclid: 8 April 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Array normal Kronecker product multiway data Bayesian estimation imputation


Fosdick, Bailey K.; Hoff, Peter D. Separable factor analysis with applications to mortality data. Ann. Appl. Stat. 8 (2014), no. 1, 120--147. doi:10.1214/13-AOAS694.

Export citation


  • Allen, G. I. and Tibshirani, R. (2010). Transposable regularized covariance models with an application to missing data imputation. Ann. Appl. Stat. 4 764–790.
  • Anderson, T. W. and Rubin, H. (1956). Statistical inference in factor analysis. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability 19541955, Vol. V 111–150. Univ. California Press, Berkeley and Los Angeles.
  • Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98 291–306.
  • Brass, W. (1971). On the scale of mortality. In Biological Aspects of Demography 69–110. Taylor and Francis, London.
  • Browne, M. W. (1984). The decomposition of multitrait-multimethod matrices. British J. Math. Statist. Psych. 37 1–21.
  • Carter, L. R. and Lee, R. D. (1992). Modeling and forecasting US sex differentials in mortality. International Journal of Forecasting 8 393–411.
  • Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Amer. Statist. Assoc. 103 1438–1456.
  • Chiou, J.-M. and Müller, H.-G. (2009). Modeling hazard rates as functional data for the analysis of cohort lifetables and mortality forecasting. J. Amer. Statist. Assoc. 104 572–585.
  • Coale, A. J. and Demeny, P. (1966). Regional Model Life Tables and Stable Populations. Princeton Univ. Press, Princeton.
  • Congdon, P. (1993). Statistical graduation in local demographic analysis and projection. J. Roy. Statist. Soc. Ser. A 156 237–270.
  • Currie, I. D., Durban, M. and Eilers, P. H. C. (2004). Smoothing and forecasting mortality rates. Stat. Model. 4 279–298.
  • Dawid, A. P. (1981). Some matrix-variate distribution theory: Notational considerations and a Bayesian application. Biometrika 68 265–274.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39 1–38.
  • De Lathauwer, L., De Moor, B. and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21 1253–1278 (electronic).
  • Diaconis, P., Goel, S. and Holmes, S. (2008). Horseshoes in multidimensional scaling and local kernel methods. Ann. Appl. Stat. 2 777–807.
  • Dobra, A., Lenkoski, A. and Rodriguez, A. (2011). Bayesian inference for general Gaussian graphical models with application to multivariate lattice data. J. Amer. Statist. Assoc. 106 1418–1433.
  • Felipe, A., Guillen, M. and Nielsen, J. P. (2001). Longevity studies based on kernel hazard estimation. Insurance Math. Econom. 28 191–204.
  • Genton, M. G. (2007). Separable approximations of space–time covariance matrices. Environmetrics 18 681–695.
  • Geweke, J. and Zhou, G. (1996). Measuring the pricing error of the arbitrage pricing theory. Rev. Financ. Stud. 9 557–587.
  • Hartmann, M. (1987). Past and recent attempts to model mortality at all ages. J. Off. Stat. 3 19–36.
  • Heligman, L. and Pollard, J. H. (1980). The age pattern of mortality. Journal of the Institute of Actuaries 107 49–80.
  • Hoff, P. D. (2011). Separable covariance arrays via the Tucker product, with applications to multivariate relational data. Bayesian Anal. 6 179–196.
  • Human Mortality Database University of California, Berkeley (USA) and Max Planck Institute for Demographic Research (Germany) (2011). Available at or (data downloaded in 2011).
  • Jennrich, R. I. and Robinson, S. M. (1969). A Newton–Raphson algorithm for maximum likelihood factor analysis. Psychometrika 34 111–123.
  • Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika 32 443–482.
  • Kass, R. E. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Amer. Statist. Assoc. 90 928–934.
  • Kiers, H. A. L. (2000). Towards a standardized notation and terminology in multiway analysis. J. Chemom. 14 105–122.
  • Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Rev. 51 455–500.
  • Kroonenberg, P. M. (2008). Applied Multiway Data Analysis. Wiley, Hoboken, NJ.
  • Lawley, D. N. (1940). The estimation of factor loadings by the method of maximum likelihood. Proc. Roy. Soc. Edinburgh Sect. A 60 64–82.
  • Lee, R. D. and Carter, L. R. (1992). Modeling and forecasting U.S. mortality. J. Amer. Statist. Assoc. 87 659–671.
  • Lee, S.-Y. and Song, X.-Y. (2002). Bayesian selection on the number of factors in a factor analysis model. Behaviormetrika 29 23–39.
  • Li, N. and Lee, R. (2005). Coherent mortality forecasts for a group of populations: An extension of the Lee–Carter method. Demography 42 575–594.
  • Liu, C. and Rubin, D. B. (1998). Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data. Statist. Sinica 8 729–747.
  • Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14 41–67.
  • Manceur, A. M. and Dutilleul, P. (2013). Maximum likelihood estimation for the tensor normal distribution: Algorithm, minimum sample size, and empirical bias and dispersion. J. Comput. Appl. Math. 239 37–49.
  • Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, San Diego, CA.
  • Martínez-Ruiz, F., Mateu, J., Montes, F. and Porcu, E. (2010). Mortality risk assessment through stationary space–time covariance functions. Stoch. Environ. Res. Risk Assess. 24 519–526.
  • McNown, R. and Rogers, A. (1989). Forecasting mortality: A parameterized time series approach. Demography 26 645–660.
  • Meng, X.-L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80 267–278.
  • Mode, C. and Busby, R. (1982). An eight-parameter model of human mortality—The single decrement case. Bull. Math. Biol. 44 647–659.
  • Murray, C. J. L., Ferguson, B. D., Lopez, A. D., Guillot, M., Salomon, J. A. and Ahmad, O. (2003). Modified logit life table system: Principles, empirical validation, and application. Population Studies 57 165–182.
  • United Nations (1982). Model life tables for developing countries. In Population Studies 77. United Nations, New York.
  • Oort, F. J. (1999). Stochastic three-mode models for mean and covariance structures. British J. Math. Statist. Psych. 52 243–272.
  • Renshaw, A. E. and Haberman, S. (2003a). Lee–Carter mortality forecasting with age-specific enhancement. Insurance Math. Econom. 33 255–272. Papers presented at the 6th IME Conference (Lisbon, 2002).
  • Renshaw, A. and Haberman, S. (2003b). Lee–Carter mortality forecasting: A parallel generalized linear modelling approach for England and Wales mortality projections. J. R. Stat. Soc. Ser. C Appl. Stat. 52 119–137.
  • Renshaw, A. E. and Haberman, S. (2003c). On the forecasting of mortality reduction factors. Insurance Math. Econom. 32 379–401.
  • Renshaw, A. E., Haberman, S. and Hatzopoulos, P. (1996). The modelling of recent mortality trends in United Kingdom male assured lives. British Actuarial Journal 2 449–477.
  • Robertson, D. and Symons, J. (2007). Maximum likelihood factor analysis with rank-deficient sample covariance matrices. J. Multivariate Anal. 98 813–828.
  • Rubin, D. B. and Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika 47 69–76.
  • Siler, W. (1983). Parameters of mortality in human populations with widely varying life spans. Stat. Med. 2 373–380.
  • Spall, J. C. (2005). Monte Carlo computation of the Fisher information matrix in nonstandard settings. J. Comput. Graph. Statist. 14 889–909.
  • Spearman, C. (1904). “General intelligence,” objectively determined and measured. Am. J. Psychol. 15 201–292.
  • Stein, M. L. (2005). Space–time covariance functions. J. Amer. Statist. Assoc. 100 310–321.
  • Wang, H. (2012). Bayesian graphical lasso models and efficient posterior computation. Bayesian Anal. 7 867–886.
  • Wang, H. and West, M. (2009). Bayesian analysis of matrix normal graphical models. Biometrika 96 821–834.
  • White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1–25.
  • Zhao, J.-H., Yu, P. L. H. and Jiang, Q. (2008). ML estimation for factor analysis: EM or non-EM? Stat. Comput. 18 109–123.