The Annals of Applied Statistics

Hierarchical infinite factor models for improving the prediction of surgical complications for geriatric patients

Elizabeth Lorenzi, Ricardo Henao, and Katherine Heller

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Nearly a third of all surgeries performed in the United States occur for patients over the age of 65; these older adults experience a higher rate of postoperative morbidity and mortality. To improve the care for these patients, we aim to identify and characterize high risk geriatric patients to send to a specialized perioperative clinic while leveraging the overall surgical population to improve learning. To this end, we develop a hierarchical infinite latent factor model (HIFM) to appropriately account for the covariance structure across subpopulations in data. We propose a novel Hierarchical Dirichlet Process shrinkage prior on the loadings matrix that flexibly captures the underlying structure of our data while sharing information across subpopulations to improve inference and prediction. The stick-breaking construction of the prior assumes an infinite number of factors and allows for each subpopulation to utilize different subsets of the factor space and select the number of factors needed to best explain the variation. We develop the model into a latent factor regression method that excels at prediction and inference of regression coefficients. Simulations validate this strong performance compared to baseline methods. We apply this work to the problem of predicting surgical complications using electronic health record data for geriatric patients and all surgical patients at Duke University Health System (DUHS). The motivating application demonstrates the improved predictive performance when using HIFM in both area under the ROC curve and area under the PR Curve while providing interpretable coefficients that may lead to actionable interventions.

Article information

Ann. Appl. Stat., Volume 13, Number 4 (2019), 2637-2661.

Received: July 2018
Revised: May 2019
First available in Project Euclid: 28 November 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian factor model nonparametrics transfer learning hierarchical modeling surgical outcomes health care


Lorenzi, Elizabeth; Henao, Ricardo; Heller, Katherine. Hierarchical infinite factor models for improving the prediction of surgical complications for geriatric patients. Ann. Appl. Stat. 13 (2019), no. 4, 2637--2661. doi:10.1214/19-AOAS1292.

Export citation


  • AHRQ (2016). Healthcare cost and utilization project (hcup) surgery flag software.
  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • Avalos-Pacheco, A., Rossell, D. and Savage, R. S. (2018). Heterogeneous large datasets integration using Bayesian factor regression. Preprint. Available at arXiv:1810.09894.
  • Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98 291–306.
  • Caron, F. and Doucet, A. (2008). Sparse Bayesian nonparametric regression. In Proceedings of the 25th International Conference on Machine Learning 88–95. ACM, New York.
  • Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q. and West, M. (2008a). High-dimensional sparse factor modeling: Applications in gene expression genomics. J. Amer. Statist. Assoc. 103 1438–1456.
  • Chen, M., Silva, J., Paisley, J., Wang, C., Dunson, D. and Carin, L. (2010). Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: Algorithm and performance bounds. IEEE Trans. Signal Process. 58 6140–6155.
  • Corey, K. M., Kashyap, S., Lorenzi, E., Lagoo-Deenadayalan, S. A., Heller, K., Whalen, K., Balu, S., Heflin, M. T., McDonald, S. R. et al. (2018). Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study. PLoS Med. 15 e1002701.
  • Desebbe, O., Lanz, T., Kain, Z. and Cannesson, M. (2016). The perioperative surgical home: An innovative, patient-centred and cost-effective perioperative care model. Anaesth. Crit. Care Pain Med. 35 59–66.
  • Elixhauser, A., Steiner, C., Harris, D. R. and Rm, C. (1998). Comorbidity measures for use with administrative data. Med. Care 36.
  • Etzioni, D. A., Liu, J. H., O’Connell, J. B., Maggard, M. A. and Ko, C. Y. (2003). Elderly patients in surgical workloads: A population-based analysis. Am. J. Surg. 69 961–965.
  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
  • Gong, J. J., Sundt, T. M., Rawn, J. D. and Guttag, J. V. (2015). Instance weighting for patient-specific risk stratification models. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 369–378. ACM, New York.
  • Hanover, N. (2001). Operative mortality with elective surgery in older adults. Eff. Clin. Pract. 4 172–177.
  • Healey, M. A., Shackford, S. R., Osler, T. M., Rogers, F. B. and Burns, E. (2002). Complications in surgical patients. Arch. Surg. 137 611–618.
  • Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161–173.
  • Ishwaran, H. and James, L. F. (2002). Approximate Dirichlet process computing in finite normal mixtures: Smoothing and prior information. J. Comput. Graph. Statist. 11 508–532.
  • Jones, T. S., Dunn, C. L., Wu, D. S., Cleveland, J. C., Kile, D. and Robinson, T. N. (2013). Relationship between asking an older adult about falls and surgical outcomes. J. Am. Med. Assoc. Surg. 148 1132–1138.
  • Lee, G., Rubinfeld, I. and Syed, Z. (2012). Adapting surgical models to individual hospitals using transfer learning. In Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on 57–63. IEEE, New York.
  • Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14 41–67.
  • Lorenzi, E., Henao, R. and Heller, K. (2019). Supplement to “Hierarchical infinite factor models for improving the prediction of surgical complications for geriatric patients.” DOI:10.1214/19-AOAS1292SUPPA, DOI:10.1214/19-AOAS1292SUPPB, DOI:10.1214/19-AOAS1292SUPPC.
  • Lucas, J., Carvalho, C., Wang, Q., Bild, A., Nevins, J. and West, M. (2006). Sparse statistical modelling in gene expression genomics. Bayesian Inference for Gene Expression and Proteomics 1 1.
  • McDonald, S. R., Heflin, M. T., Whitson, H. E., Dalton, T. O., Lidsky, M. E., Liu, P., Poer, C. M., Sloane, R., Thacker, J. K. et al. (2018). Association of integrated care coordination with postsurgical outcomes in high-risk older adults: The perioperative optimization of senior health (POSH) initiative. JAMA Surg. DOI: 10.1001/jamasurg.2017.5513
  • McParland, D., Gormley, I. C., McCormick, T. H., Clark, S. J., Kabudula, C. W. and Collinson, M. A. (2014). Clustering South African households based on their asset status using latent variable models. Ann. Appl. Stat. 8 747–776.
  • McParland, D., Phillips, C. M., Brennan, L., Roche, H. M. and Gormley, I. C. (2017). Clustering high-dimensional mixed data to uncover sub-phenotypes: Joint analysis of phenotypic and genotypic data. Stat. Med. 36 4548–4569.
  • Murphy, K., Gormley, I. C. and Viroli, C. (2017). Infinite mixtures of infinite factor analysers: Nonparametric model-based clustering via latent gaussian models. Preprint. Available at arXiv:1701.07010.
  • Ni, Y., Mueller, P. and Ji, Y. (2018). Bayesian double feature allocation for phenotyping with electronic health records. Preprint. Available at arXiv:1809.08988.
  • Polson, N. G. and Scott, J. G. (2010). Shrink globally, act locally: Sparse Bayesian regularization and prediction. In Bayesian Statistics 9 501–538. Oxford Univ. Press, Oxford.
  • Raval, M. V. and Eskandari, M. K. (2012). Outcomes of elective abdominal aortic aneurysm repair among the elderly: Endovascular versus open repair. Surgery 151 245–260.
  • Ročková, V. and George, E. I. (2016). Fast Bayesian factor analysis via automatic rotations to sparsity. J. Amer. Statist. Assoc. 111 1608–1622.
  • Seo, D. M., Goldschmidt-Clermont, P. J. and West, M. (2007). Of mice and men: Space statistical modeling in cardiovascular genomics. Ann. Appl. Stat. 1 152–178.
  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statist. Sinica 4 639–650.
  • Speziale, G., Nasso, G., Barattoni, M. C., Esposito, G., Popoff, G., Argano, V., Greco, E., Scorcin, M., Zussa, C. et al. (2011). Short-term and long-term results of cardiac surgery in elderly and very elderly patients. J. Thorac. Cardiovasc. Surg. 141 725–731.
  • Teh, Y. W., Jordan, M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101 1566–1581.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • West, M. (2003). Bayesian factor regression models in the “large $p$, small $n$” paradigm. In Bayesian Statistics, 7 (Tenerife, 2002) 733–742. Oxford Univ. Press, New York.
  • Wiens, J., Guttag, J. and Horvitz, E. (2014). A study in transfer learning: Leveraging data from multiple hospitals to enhance hospital-specific predictions. J. Am. Med. Inform. Assoc. 21 699–706.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.

Supplemental materials

  • A. Proofs of HIFM properties. Properties of hierarchical infinite factor model prior on loadings matrix.
  • B. Inference for full model. All steps needed to sample the model.
  • C. Variable definitions shown in Figure 4. Description of variable names shown in Figure 4.