The Annals of Applied Statistics

The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul

Yacine Jernite, Pierre Latouche, Charles Bouveyron, Patrick Rivera, Laurent Jegou, and Stéphane Lamassé

Full-text: Open access


In the last two decades many random graph models have been proposed to extract knowledge from networks. Most of them look for communities or, more generally, clusters of vertices with homogeneous connection profiles. While the first models focused on networks with binary edges only, extensions now allow to deal with valued networks. Recently, new models were also introduced in order to characterize connection patterns in networks through mixed memberships. This work was motivated by the need of analyzing a historical network where a partition of the vertices is given and where edges are typed. A known partition is seen as a decomposition of a network into subgraphs that we propose to model using a stochastic model with unknown latent clusters. Each subgraph has its own mixing vector and sees its vertices associated to the clusters. The vertices then connect with a probability depending on the subgraphs only, while the types of edges are assumed to be sampled from the latent clusters. A variational Bayes expectation-maximization algorithm is proposed for inference as well as a model selection criterion for the estimation of the cluster number. Experiments are carried out on simulated data to assess the approach. The proposed methodology is then applied to an ecclesiastical network in Merovingian Gaul. An R code, called Rambo, implementing the inference algorithm is available from the authors upon request.

Article information

Ann. Appl. Stat. Volume 8, Number 1 (2014), 377-405.

First available in Project Euclid: 8 April 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Ecclesiastical network subgraphs stochastic bloc models random subgraph model


Jernite, Yacine; Latouche, Pierre; Bouveyron, Charles; Rivera, Patrick; Jegou, Laurent; Lamassé, Stéphane. The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul. Ann. Appl. Stat. 8 (2014), no. 1, 377--405. doi:10.1214/13-AOAS691.

Export citation


  • Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
  • Albert, R. and Barabási, A.-L. (2002). Statistical mechanics of complex networks. Rev. Modern Phys. 74 47–97.
  • Ambroise, C., Grasseau, G., Hoebeke, M., Latouche, P., Miele, V. and Picard, F. (2010). The mixer R package. Available at
  • Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • Biernacki, C., Celeux, G. and Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput. Statist. Data Anal. 41 561–575.
  • Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute 4 126.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer, New York.
  • Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6 1847–1899.
  • Daudin, J. J., Picard, F. and Robin, S. (2008). A mixture model for random graphs. Stat. Comput. 18 173–183.
  • Fienberg, S. E. and Wasserman, S. S. (1981). Categorical data analysis of single sociometric relations. Sociol. Method. 12 156–192.
  • Frank, O. and Harary, F. (1982). Cluster inference by using transitivity indices in empirical graphs. J. Amer. Statist. Assoc. 77 835–840.
  • Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821–7826 (electronic).
  • Goldenberg, A., Zheng, A. X. and Fienberg, S. E. (2010). A Survey of Statistical Network Models. Now Publishers, Hanover, MA.
  • Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301–354.
  • Hofman, J. M. and Wiggins, C. H. (2008). Bayesian approach to network modularity. Phys. Rev. Lett. 100 258701.
  • Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65.
  • Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proc. Roy. Soc. London. Ser. A. 186 453–461.
  • Jernite, Y., Latouche, P., Bouveyron, C., Rivera, P., Jegou, L. and Lamassé, S. (2013). Supplement to “The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul.” DOI:10.1214/13-AOAS691SUPP.
  • Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the National Conference on Artificial Intelligence 21 381. AAAI Press, Boston, MA.
  • Latouche, P., Birmelé, E. and Ambroise, C. (2009). Advances in Data Analysis Data Handling and Business Intelligence Bayesian methods for graph clustering, 229–239. Springer, Berlin.
  • Latouche, P., Birmelé, E. and Ambroise, C. (2011). Overlapping stochastic block models with application to the French political blogosphere. Ann. Appl. Stat. 5 309–336.
  • Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational Bayesian inference and complexity control for stochastic block models. Stat. Model. 12 93–115.
  • Mariadassou, M. and Matias, C. (2014). Convergence of the groups posterior distribution in latent or stochastic block models. Bernoulli. To appear.
  • Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, D., Chklovskii, D. and Alon, U. (2002). Network motifs: Simple building blocks of complex networks. Science 298 824–827.
  • Moreno, J. L. (1934). Who Shall Survive?: A New Approach to the Problem of Human Interrelations. Nervous and Mental Disease Publishing Co, Washington, DC.
  • Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077–1087.
  • Palla, G., Derényi, I., Farkas, I. and Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature 435 814–818.
  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 846–850.
  • Salter-Townshend, M., White, A., Gollini, I. and Murphy, T. B. (2012). Review of statistical network analysis: Models, algorithms, and software. Stat. Anal. Data Min. 5 260–264.
  • Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.
  • Soufiani, H. A. and Airoldi, E. M. (2012). Graphlet decomposition of a weighted network. J. Mach. Learn. Res. 22 54–63.
  • Villa, N., Rossi, F. and Truong, Q. D. (2008). Mining a medieval social network by kernel SOM and related methods. In Proceedings of MASHS 2008 (Modèles et Apprentissage en Sciences Humaines et Sociales), Créteil, France, June 2008. Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the National Conference on Artificial Intelligence 21 381. AAAI Press, Boston, MA.
  • Wang, Y. J. and Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. J. Amer. Statist. Assoc. 82 8–19.
  • White, H. C., Boorman, S. A. and Breiger, R. L. (1976). Social structure from multiple networks. I. Blockmodels of roles and positions. Am. J. Sociol. 730–780.
  • Xing, E. P., Fu, W. and Song, L. (2010). A state-space mixed membership blockmodel for dynamic network tomography. Ann. Appl. Stat. 4 535–566.

Supplemental materials

  • Supplementary material: Data and code. We provide the original ecclesiastical network along with a file giving the kingdoms of all vertices in the network and an R code implementing the variational inference approach for the RSM model.