Bayesian Analysis
- Bayesian Anal.
- Volume 15, Number 3 (2020), 937-963.
Infinite Mixtures of Infinite Factor Analysers
Keefe Murphy, Cinzia Viroli, and Isobel Claire Gormley
Full-text: Open access
Abstract
Factor-analytic Gaussian mixtures are often employed as a model-based approach to clustering high-dimensional data. Typically, the numbers of clusters and latent factors must be fixed in advance of model fitting. The pair which optimises some model selection criterion is then chosen. For computational reasons, having the number of factors differ across clusters is rarely considered.
Here the infinite mixture of infinite factor analysers (IMIFA) model is introduced. IMIFA employs a Pitman-Yor process prior to facilitate automatic inference of the number of clusters using the stick-breaking construction and a slice sampler. Automatic inference of the cluster-specific numbers of factors is achieved using multiplicative gamma process shrinkage priors and an adaptive Gibbs sampler. IMIFA is presented as the flagship of a family of factor-analytic mixtures.
Applications to benchmark data, metabolomic spectral data, and a handwritten digit example illustrate the IMIFA model’s advantageous features. These include obviating the need for model selection criteria, reducing the computational burden associated with the search of the model space, improving clustering performance by allowing cluster-specific numbers of factors, and uncertainty quantification.
Article information
Source
Bayesian Anal., Volume 15, Number 3 (2020), 937-963.
Dates
First available in Project Euclid: 9 October 2019
Permanent link to this document
https://projecteuclid.org/euclid.ba/1570586978
Digital Object Identifier
doi:10.1214/19-BA1179
Keywords
model-based clustering factor analysis Pitman-Yor process multiplicative gamma process adaptive Markov chain Monte Carlo
Rights
Creative Commons Attribution 4.0 International License.
Citation
Murphy, Keefe; Viroli, Cinzia; Gormley, Isobel Claire. Infinite Mixtures of Infinite Factor Analysers. Bayesian Anal. 15 (2020), no. 3, 937--963. doi:10.1214/19-BA1179. https://projecteuclid.org/euclid.ba/1570586978
References
- Baek, J., McLachlan, G. J., and Flack, L. K. (2010). “Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7): 1298–1309.
- Bai, J. and Li, K. (2012). “Statistical analysis of factor models of high dimension.” The Annals of Statistics, 40(1): 436–465.Zentralblatt MATH: 1246.62144
Digital Object Identifier: doi:10.1214/11-AOS966
Project Euclid: euclid.aos/1334581749 - Bhattacharya, A. and Dunson, D. B. (2011). “Sparse Bayesian infinite factor models.” Biometrika, 98(2): 291–306.
- Brooks, S. P. and Gelman, A. (1998). “Generative methods for monitoring convergence of iterative simulations.” Journal of Computational and Graphical Statistics, 7(4): 434–455.
- Carmody, S. and Brennan, L. (2010). “Effects of pentylenetetrazole-induced seizures on metabolomic profiles of rat brain.” Neurochemistry International, 56(2): 340–344.
- Carmona, C., Nieto-barajas, L., and Canale, A. (2019). “Model based approach for household clustering with mixed scale variables.” Advances in Data Analysis and Classification, 13(2): 559–583.
- Carpaneto, G. and Toth, P. (1980). “Solution of the assignment problem.” ACM Transactions on Mathematical Software, 6(1): 104–111.
- Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q., and West, M. (2008). “High-dimensional sparse factor modeling: applications in gene expression genomics.” Journal of the American Statistical Association, 103(484): 1438–1456.
- Chen, M., Silva, J., Paisley, J., Wang, C., Dunson, D. B., and Carin, L. (2010). “Compressive sensing on manifolds using a nonparametric mixture of factor analyzers: algorithm and performance bounds.” IEEE Transactions on Signal Processing, 58(12): 6140–6155.
- De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prünster, I., and Ruggiero, M. (2015). “Are Gibbs-type priors the most natural generalization of the Dirichlet process?” IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212–229.
- Diebolt, J. and Robert, C. P. (1994). “Estimation of finite mixture distributions through Bayesian sampling.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 56(2): 363–375.
- Durante, D. (2017). “A note on the multiplicative gamma process.” Statistics & Probability Letters, 122: 198–204.
- Ferguson, T. S. (1973). “A Bayesian analysis of some nonparametric problems.” The Annals of Statistics, 1(2): 209–230.Zentralblatt MATH: 0255.62037
Digital Object Identifier: doi:10.1214/aos/1176342360
Project Euclid: euclid.aos/1176342360 - Fokoué, E. and Titterington, D. M. (2003). “Mixtures of factor analysers. Bayesian estimation and inference by stochastic simulation.” Machine Learning, 50(1): 73–94.
- Forina, M., Armanino, C., Lanteri, S., and Tiscornia, E. (1983). “Classification of olive oils from their fatty acid composition.” In Martens, H. and Russrum Jr., H. (eds.), Food Research and Data Analysis, 189–214. Applied Science Publishers, London.
- Frühwirth-Schnatter, S. (2010). Finite mixture and Markov switching models. Series in Statistics. New York: Springer.
- Frühwirth-Schnatter, S. (2011). “Dealing with label switching under model uncertainty.” In Mengersen, K. L., Robert, C. P., and Titterington, D. M. (eds.), Mixtures: Estimation and Applications, Wiley Series in Probability and Statistics, 193–218. Chichester: John Wiley & Sons.Mathematical Reviews (MathSciNet): MR2883354
- Frühwirth-Schnatter, S. and Lopes, H. F. (2010). “Parsimonious Bayesian factor analysis when the number of factors is unknown.” Technical report, The University of Chicago Booth School of Business.
- Frühwirth-Schnatter, S. and Malsiner-Walli, G. (2019). “From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering.” Advances in Data Analysis and Classification, 13(1): 33–63.
- Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2004). Bayesian data analysis. Chapman and Hall/CRC Press, third edition.Zentralblatt MATH: 1279.62004
- Ghahramani, Z. and Hinton, G. E. (1996). “The EM algorithm for mixtures of factor analyzers.” Technical report, Department of Computer Science, University of Toronto.
- Ghosh, J. and Dunson, D. B. (2008). “Default prior distributions and efficient posterior computation in Bayesian factor analysis.” Journal of Computational and Graphical Statistics, 18(2): 306–320.
- Green, P. J. and Richardson, S. (2001). “Modelling heterogeneity with and without the Dirichlet process.” Scandinavian Journal of Statistics, 28(2): 355–375.
- Hastie, D. I., Liverani, S., and Richardson, S. (2014). “Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations.” Statistics and Computing, 25(5): 1023–1037.
- Hastie, T., Tibshirani, R., and Friedman, J. (2001). The elements of statistical learning. Springer Series in Statistics. New York: Springer, second edition.Zentralblatt MATH: 0973.62007
- Hubert, L. and Arabie, P. (1985). “Comparing partitions.” Journal of Classification, 2(1): 193–218.Zentralblatt MATH: 0587.62128
- Kalli, M., Griffin, J. E., and Walker, S. G. (2011). “Slice sampling mixture models.” Statistics and Computing, 21(1): 93–105.
- Kass, R. E. and Raftery, A. E. (1995). “Bayes factors.” Journal of the American Statistical Association, 90(430): 773–795.
- Kim, S., Tadesse, M. G., and Vannucci, M. (2006). “Variable selection in clustering via Dirichlet process mixture models.” Biometrika, 93(4): 877–893.
- Knott, M. and Bartholomew, D. J. (1999). Latent variable models and factor analysis. Number 7 in Kendall’s library of statistics. London: Edward Arnold, second edition.
- Knowles, D. and Ghahramani, Z. (2007). “Infinite sparse factor analysis and infinite independent components analysis.” In Davies, M. E., James, C. J., Abdallah, S. A., and Plumbley, M. D. (eds.), Independent component analysis and signal separation, 381–388. Berlin, Heidelberg: Springer.Zentralblatt MATH: 1173.94367
- Knowles, D. and Ghahramani, Z. (2011). “Nonparametric Bayesian sparse factor models with application to gene expression modeling.” The Annals of Applied Statistics, 5(2B): 1534–1552.Zentralblatt MATH: 1223.62013
Digital Object Identifier: doi:10.1214/10-AOAS435
Project Euclid: euclid.aoas/1310562732 - Lee, J. and MacEachern, S. N. (2014). “Inference functions in high dimensional Bayesian inference.” Statistics and Its Interface, 7(4): 477–486.
- Legramanti, S., Durante, D., and Dunson, D. B. (2019). “Bayesian cumulative shrinkage for infinite factorizations.” arXiv:1902.04349.
- McLachlan, G. J. and Peel, D. (2000). Finite mixture models. Wiley Series in Probability and Statistics. New York: John Wiley & Sons.Zentralblatt MATH: 0963.62061
- McNicholas, P. D. (2010). “Model-based classification using latent Gaussian mixture models.” Journal of Statistical Planning and Inference, 140(5): 1175–1181.
- McNicholas, P. D., ElSherbiny, A., McDaid, A. F., and Murphy, T. B. (2018). pgmm: parsimonious Gaussian mixture models. R package version 1.2.3. URL https://cran.r-project.org/package=pgmm.
- McNicholas, P. D. and Murphy, T. B. (2008). “Parsimonious Gaussian mixture models.” Statistics and Computing, 18(3): 285–296.
- McParland, D., Gormley, I. C., McCormick, T. H., Clark, S. J., Kabudula, C. W., and Collinson, M. A. (2014). “Clustering South African households based on their asset status using latent variable models.” The Annals of Applied Statistics, 8(2): 747–767.Zentralblatt MATH: 06333775
Digital Object Identifier: doi:10.1214/14-AOAS726
Project Euclid: euclid.aoas/1404229513 - Miller, J. W. and Dunson, D. B. (2018). “Robust Bayesian inference via coarsening.” Journal of the American Statistical Association, 114(527): 1113–1125.
- Miller, J. W. and Harrison, M. T. (2013). “A simple example of Dirichlet process mixture inconsistency for the number of components.” Advances in Neural Information Processing Systems, 26: 199–206.
- Miller, J. W. and Harrison, M. T. (2014). “Inconsistency of Pitman-Yor process mixtures for the number of components.” The Journal of Machine Learning Research, 15(1): 3333–3370.Zentralblatt MATH: 1319.62100
- Müller, P. and Mitra, R. (2013). “Bayesian nonparametric inference – why and how.” Bayesian Analysis, 8(2): 269–360.
- Murphy, K., Viroli, C., and Gormley, I. C. (2019a). “Supplementary material: infinite mixtures of infinite factor analysers.” Bayesian Analysis.
- Murphy, K., Viroli, C., and Gormley, I. C. (2019b). IMIFA: infinite mixtures of infinite factor analysers and related models. R package version 2.1.0. URL https://cran.r-project.org/package=IMIFA.
- Ng, A. Y., Jordan, M. I., and Weiss, Y. (2001). “On spectral clustering: analysis and an algorithm.” In Advances in neural information processing systems, 849–856. Cambridge, MA, USA: MIT Press.
- Nyamundanda, G., Brennan, L., and Gormley, I. C. (2010). “Probabilistic principle component analysis for metabolomic data.” BMC Bioinformatics, 11(571): 1–11.
- Paisley, J. and Carin, L. (2009). “Nonparametric factor analysis with Beta process priors.” In Proceedings of the 26th annual international conference on machine learning, ICML ’09, 777–784. New York, NY, USA: ACM.
- Papaspiliopoulos, O. and Roberts, G. O. (2008). “Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models.” Biometrika, 95(1): 169–186.
- Papastamoulis, P. (2018). “Overfitting Bayesian mixtures of factor analyzers with an unknown number of components.” Computational Statistics & Data Analysis, 124: 220–234.
- Peel, D. and McLachlan, G. J. (2000). “Robust mixture modelling using the $t$ distribution.” Statistics and Computing, 10: 339–348.
- Perman, M., Pitman, J., and Yor, M. (1992). “Size-biased sampling of Poisson point processes and excursions.” Probability Theory and Related Fields, 92(1): 21–39.
- Pitman, J. (1996). “Random discrete distributions invariant under size-biased permutation.” Advances in Applied Probability, 28(2): 525–539.
- Pitman, J. and Yor, M. (1997). “The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator.” The Annals of Probability, 25(2): 855–900.Zentralblatt MATH: 0880.60076
Digital Object Identifier: doi:10.1214/aop/1024404422
Project Euclid: euclid.aop/1024404422 - Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). “CODA: convergence diagnosis and output analysis for MCMC.” R News, 6(1): 7–11.
- Raftery, A. E., Newton, M., Satagopan, J., and Krivitsky, P. (2007). “Estimating the integrated likelihood via posterior simulation using the harmonic mean identity.” In Bayesian statistics 8, 1–45.Zentralblatt MATH: 1252.62038
- R Core Team (2019). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
- Richardson, S. and Green, P. J. (1997). “On Bayesian analysis of mixtures with an unknown number of components (with discussion).” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59(4): 731–792.
- Rocková, V. and George, E. I. (2016). “Fast Bayesian factor analysis via automatic rotations to sparsity.” Journal of the American Statistical Association, 111(516): 1608–1622.
- Rodriguez, C. E. and Walker, S. G. (2014). “Univariate Bayesian nonparametric mixture modeling with unimodal kernels.” Statistics and Computing, 24(1): 35–49.
- Rousseau, J. and Mengersen, K. (2011). “Asymptotic behaviour of the posterior distribution in overfitted mixture models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(5): 689–710.
- Rue, H. and Held, L. (2005). Gaussian Markov random fields: theory and applications, volume 104 of Monographs on statistics and applied probability. London: Chapman and Hall/CRC Press.Zentralblatt MATH: 1093.60003
- Scrucca, L., Fop, M., Murphy, T. B., and Raftery, A. E. (2016). “mclust 5: clustering, classification and density estimation using Gaussian finite mixture models.” The R Journal, 8(1): 289–317.
- Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4): 583–639.
- Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2014). “The deviance information criterion: 12 years on.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(3): 485–493.
- Stephens, M. (2000). “Bayesian analysis of mixture models with an unknown number of components – an alternative to reversible jump methods.” The Annals of Statistics, 28(1): 40–74.Zentralblatt MATH: 1106.62316
Digital Object Identifier: doi:10.1214/aos/1016120364
Project Euclid: euclid.aos/1016120364 - Tipping, M. E. and Bishop, C. M. (1999). “Mixtures of probabilistic principal component analyzers.” Neural Computation, 11(2): 443–482.
- van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K., and van der Werf, M. J. (2006). “Centering, scaling, and transformations: improving the biological information content of metabolomics data.” BMC Genomics, 7(1): 142.
- van Havre, Z., White, N., Rousseau, J., and Mengersen, K. (2015). “Overfitting Bayesian mixture models with an unknown number of components.” PloS one, 10(7): e0131739.
- Viroli, C. (2010). “Dimensionally reduced model-based clustering through mixtures of factor mixture analyzers.” Journal of classification, 27(3): 363–388.
- Viroli, C. (2011). “Finite mixtures of matrix normal distributions for classifying three-way data.” Statistics and Computing, 21(4): 511–522.
- Walker, S. G. (2007). “Sampling the Dirichlet mixture model with slices.” Communications in Statistics – Simulation and Computation, 36(1): 45–54.
- Wang, C., Pan, G., Tong, T., and L, Z. (2015). “Shrinkage estimation of large dimensional precision matrix using random matrix theory.” Statistica Sinica, 25(3): 993–1008.Zentralblatt MATH: 1415.62035
- Wang, Y., Canale, A., and Dunson, D. B. (2016). “Scalable geometric density estimation.” In Gretton, A. and Robert, C. P. (eds.), Proceedings of the 19th international conference on artificial intelligence and statistics, volume 51 of Proceedings of Machine Learning Research, 857–865. Cadiz, Spain: PMLR.
- West, M. (2003). “Bayesian factor regression models in the “large p, small n” paradigm.” In Bayesian statistics 7, 723–732. Oxford University Press.
- West, M., Müller, P., and Escobar, M. D. (1994). “Hierarchical priors and mixture models, with applications in regression and density estimation.” In Smith, A. F. M. and Freeman, P. R. (eds.), Aspects of uncertainty: a tribute to D. V. Lindley, 363–386. New York: John Wiley & Sons.
- Xing, E. P., Sohn, K. A., Jordan, M. I., and Teh, Y. W. (2006). “Bayesian multi-population haplotype inference via a hierarchical Dirichlet process mixture.” In Proceedings of the 23rd International Conference on Machine Learning, 1049–1056. ACM.
- Yellott, J. I., Jr. (1977). “The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment, and the double exponential distribution.” Journal of Mathematical Psychology, 15(2): 109–144.
- Yerebakan, H. Z., Rajwa, B., and Dundar, M. (2014). “The infinite mixture of infinite Gaussian mixtures.” In Advances in Neural Information Processing Systems, 28–36.
Supplemental materials
- Supplementary material: infinite mixtures of infinite factor analysers. Digital Object Identifier: doi:10.1214/19-BA1179SUPP

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Bayesian nonparametric model with clustering individual co-exposure to pesticides
found in the French diet
Crépet, Amélie and Tressou, Jessica, Bayesian Analysis, 2011 - Power-Expected-Posterior Priors for Generalized Linear Models
Fouskakis, Dimitris, Ntzoufras, Ioannis, and Perrakis, Konstantinos, Bayesian Analysis, 2018 - Bayesian Deconvolution and Quantification of Metabolites from J-Resolved NMR Spectroscopy
Heinecke, Andreas, Ye, Lifeng, De Iorio, Maria, and Ebbels, Timothy, Bayesian Analysis, 2020
- Bayesian nonparametric model with clustering individual co-exposure to pesticides
found in the French diet
Crépet, Amélie and Tressou, Jessica, Bayesian Analysis, 2011 - Power-Expected-Posterior Priors for Generalized Linear Models
Fouskakis, Dimitris, Ntzoufras, Ioannis, and Perrakis, Konstantinos, Bayesian Analysis, 2018 - Bayesian Deconvolution and Quantification of Metabolites from J-Resolved NMR Spectroscopy
Heinecke, Andreas, Ye, Lifeng, De Iorio, Maria, and Ebbels, Timothy, Bayesian Analysis, 2020 - Hierarchical Bayesian nonparametric mixture models for clustering with variable
relevance determination
Yau, Christopher and Holmes, Chris, Bayesian Analysis, 2011 - Bayesian hidden Markov tree models for clustering genes with shared evolutionary history
Li, Yang, Ning, Shaoyang, Calvo, Sarah E., Mootha, Vamsi K., and Liu, Jun S., Annals of Applied Statistics, 2019 - Dynamic density estimation with diffusive Dirichlet mixtures
Mena, Ramsés H. and Ruggiero, Matteo, Bernoulli, 2016 - Adaptive Bayesian Density Estimation in Lp-metrics with Pitman-Yor or Normalized Inverse-Gaussian Process Kernel Mixtures
Scricciolo, Catia, Bayesian Analysis, 2014 - A hierarchical framework for state-space matrix inference and clustering
Zuo, Chandler, Chen, Kailei, Hewitt, Kyle J., Bresnick, Emery H., and Keleş, Sündüz, Annals of Applied Statistics, 2016 - Dynamic Regression Models for Time-Ordered Functional Data
Kowal, Daniel R., Bayesian Analysis, 2020 - Bayesian nonparametric Plackett–Luce models for the analysis of preferences for college degree programmes
Caron, François, Teh, Yee Whye, and Murphy, Thomas Brendan, Annals of Applied Statistics, 2014
