Bernoulli

  • Bernoulli
  • Volume 21, Number 1 (2015), 537-573.

Convergence of the groups posterior distribution in latent or stochastic block models

Mahendra Mariadassou and Catherine Matias

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We propose a unified framework for studying both latent and stochastic block models, which are used to cluster simultaneously rows and columns of a data matrix. In this new framework, we study the behaviour of the groups posterior distribution, given the data. We characterize whether it is possible to asymptotically recover the actual groups on the rows and columns of the matrix, relying on a consistent estimate of the parameter. In other words, we establish sufficient conditions for the groups posterior distribution to converge (as the size of the data increases) to a Dirac mass located at the actual (random) groups configuration. In particular, we highlight some cases where the model assumes symmetries in the matrix of connection probabilities that prevents recovering the original groups. We also discuss the validity of these results when the proportion of non-null entries in the data matrix converges to zero.

Article information

Source
Bernoulli Volume 21, Number 1 (2015), 537-573.

Dates
First available in Project Euclid: 17 March 2015

Permanent link to this document
http://projecteuclid.org/euclid.bj/1426597081

Digital Object Identifier
doi:10.3150/13-BEJ579

Mathematical Reviews number (MathSciNet)
MR3322330

Zentralblatt MATH identifier
1329.62285

Keywords
biclustering block clustering block modelling co-clustering latent block model posterior distribution stochastic block model

Citation

Mariadassou, Mahendra; Matias, Catherine. Convergence of the groups posterior distribution in latent or stochastic block models. Bernoulli 21 (2015), no. 1, 537--573. doi:10.3150/13-BEJ579. http://projecteuclid.org/euclid.bj/1426597081.


Export citation

References

  • [1] Airoldi, E., Blei, D., Fienberg, S. and Xing, E. (2008). Mixed-membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
  • [2] Allman, E.S., Matias, C. and Rhodes, J.A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099–3132.
  • [3] Allman, E.S., Matias, C. and Rhodes, J.A. (2011). Parameter identifiability in a class of random graph mixture models. J. Statist. Plann. Inference 141 1719–1736.
  • [4] Ambroise, C. and Matias, C. (2012). New consistent and asymptotically normal parameter estimates for random-graph mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 3–35.
  • [5] Bickel, P. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • [6] Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6 1847–1899.
  • [7] Channarond, A., Daudin, J.-J. and Robin, S. (2012). Classification and estimation in the Stochastic Blockmodel based on the empirical degrees. Electron. J. Stat. 6 2574–2601.
  • [8] Choi, D.S., Wolfe, P.J. and Airoldi, E.M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika 99 273–284.
  • [9] Daudin, J.-J., Picard, F. and Robin, S. (2008). A mixture model for random graphs. Stat. Comput. 18 173–183.
  • [10] DeSarbo, W.S., Fong, D.K.H., Liechty, J. and Saxton, M.K. (2004). A hierarchical Bayesian procedure for two-mode cluster analysis. Psychometrika 69 547–572.
  • [11] Flynn, C. and Perry, P. (2013). Consistent biclustering. Technical report, arXiv:1206.6927.
  • [12] Frank, O. and Harary, F. (1982). Cluster inference by using transitivity indices in empirical graphs. J. Amer. Statist. Assoc. 77 835–840.
  • [13] Gazal, S., Daudin, J.-J. and Robin, S. (2012). Accuracy of variational estimates for random graph mixture models. J. Stat. Comput. Simul. 82 849–862.
  • [14] Govaert, G. and Nadif, M. (2003). Clustering with block mixture models. Pattern Recognition 36 463–473.
  • [15] Govaert, G. and Nadif, M. (2008). Block clustering with Bernoulli mixture models: Comparison of different approaches. Comput. Statist. Data Anal. 52 3233–3245.
  • [16] Govaert, G. and Nadif, M. (2010). Latent block model for contingency table. Comm. Statist. Theory Methods 39 416–425.
  • [17] Hartigan, J.A. (1972). Direct clustering of a data matrix. J. Amer. Statist. Assoc. 67 123–129.
  • [18] Holland, P.W., Laskey, K.B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
  • [19] Latouche, P., Birmelé, E. and Ambroise, C. (2011). Overlapping stochastic block models with application to the French political blogosphere. Ann. Appl. Stat. 5 309–336.
  • [20] Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational Bayesian inference and complexity control for stochastic block models. Stat. Model. 12 93–115.
  • [21] Mariadassou, M., Robin, S. and Vacher, C. (2010). Uncovering latent structure in valued graphs: A variational approach. Ann. Appl. Stat. 4 715–742.
  • [22] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, With a foreword by Jean Picard.
  • [23] Nowicki, K. and Snijders, T.A.B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077–1087.
  • [24] Picard, F., Miele, V., Daudin, J.-J., Cottret, L. and Robin, S. (2009). Deciphering the connectivity structure of biological networks using MixNet. BMC Bioinformatics 10 1–11.
  • [25] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
  • [26] Rohe, K. and Yu, B. (2012). Co-clustering for directed graphs: the stochastic co-blockmodel and a spectral algorithm. Technical report, arXiv:1204.2296.
  • [27] Snijders, T.A.B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.
  • [28] Wyse, J. and Friel, N. (2012). Block clustering with collapsed latent block models. Stat. Comput. 22 415–428.
  • [29] Zanghi, H., Ambroise, C. and Miele, V. (2008). Fast online graph clustering via Erdős Rényi mixture. Pattern Recognition 41 3592–3599.
  • [30] Zanghi, H., Picard, F., Miele, V. and Ambroise, C. (2010). Strategies for online inference of model-based clustering in large and growing networks. Ann. Appl. Stat. 4 687–714.