The Annals of Applied Statistics

Uncovering latent structure in valued graphs: A variational approach

Mahendra Mariadassou, Stéphane Robin, and Corinne Vacher

Full-text: Open access

Abstract

As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case.

We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host–parasite interaction networks in forest ecosystems.

Article information

Source
Ann. Appl. Stat. Volume 4, Number 2 (2010), 715-742.

Dates
First available in Project Euclid: 3 August 2010

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1280842137

Digital Object Identifier
doi:10.1214/10-AOAS361

Mathematical Reviews number (MathSciNet)
MR2758646

Zentralblatt MATH identifier
1194.62125

Keywords
Ecological networks host–parasite interactions latent structure mixture model random graph valued graph variational method

Citation

Mariadassou, Mahendra; Robin, Stéphane; Vacher, Corinne. Uncovering latent structure in valued graphs: A variational approach. Ann. Appl. Stat. 4 (2010), no. 2, 715--742. doi:10.1214/10-AOAS361. https://projecteuclid.org/euclid.aoas/1280842137.


Export citation

References

  • Airoldi, E. M. and Carley, K. M. (2005). Sampling algorithms for pure network topologies. ACM KDD Explorations 7 13–22.
  • Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
  • Albert, R. and Barabási, A. L. (2002). Statistical mechanics of complex networks. Rev. Modern Phys. 74 47–97.
  • Attias, H. (2000). A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems 12 209–215. MIT Press, Cambridge.
  • Barabási, A. L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509–512.
  • Beal, M. J. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: With application to scoring graphical model structures. In Bayesian Statistics 7 (J. M. Bernardo et al., eds.) 543–552. Oxford Univ. Press, Oxford.
  • Bersier, L. F. and Kehrli, P. (2008). The signature of phylogenetic constraints on food-web structure. Ecol. Complex. 5 132–139.
  • Biernacki, C., Celeux, G. and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Machine Intel. 22 719–725.
  • Blomberg, S. P. and Garland, T. J. (2002). Tempo and mode in evolution: Phylogenetic inertia, adaptation and comparative methods. J. Evol. Biol. 15 899–910.
  • Brandle, M. and Brandl, R. (2006). Is the composition of phytophagous insects and parasitic fungi among trees predictable? Oikos 113 296–304.
  • Burnham, K. P. and Anderson, R. A. (1998). Model Selection and Inference: A Practical Information-Theoretic Approach. Wiley, New York.
  • Cattin, M. F., Bersier, L. F., Banasek-Richter, C. C., Baltensperger, R. and Gabriel, J. P. (2004). Phylogenetic constraints and adaptation explain food-web structure. Nature 427 835–839.
  • Daudin, J.-J., Picard, F. and Robin, S. (2008). A mixture model for random graphs. Statist. Comput. 18 173–183.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39 1–38.
  • Erdös, P. and Rényi, A. (1959). On random graphs, i. Publ. Math. 6 290–297.
  • Fienberg, S. E. and Wasserman, S. (1981). Categorical data analysis of single sociometric relations. In Sociological Methodology 1981 156–192. Jossey-Bass, San Francisco.
  • Fienberg, S. E., Meyer, M. M. and Wasserman, S. S. (1985). Statistical analysis of multiple sociometric relations. J. Amer. Statist. Assoc. 80 51–67.
  • Getoor, L. and Diehl, C. P. (2004). Link mining: A survey. SIGKDD Explor. 7 3–12.
  • Gilbert, G. S. and Webb, C. O. (2007). Phylogenetic signal in plant pathogen-host range. Proc. Natl. Acad. Sci. USA 104 4979–4983.
  • Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821–7826.
  • Govaert, G. and Nadif, M. (2005). An EM algorithm for the block mixture model. IEEE Trans. Pattern Anal. Machine Intel. 27 643–647.
  • Hofman, J. M. and Wiggins, C. H. (2008). A Bayesian approach to network modularity. Phys. Rev. Lett. 100 258701.
  • Holland, P. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–50.
  • Ives, A. R. and Godfray, H. C. J. (2006). Phylogenetic analysis of trophic associations. Am. Nat. 16 E1–E14.
  • Jaakkola, T. (2000). Tutorial on variational approximation methods. In Advanced Mean Field Methods: Theory and Practice. MIT Press, Cambridge.
  • Jaccard, P. (1901). Tude comparative de la distribution florale dans une portion des alpes et des jura. Bullet. Soc. Vaud. Sci. Natur. 37 547–579.
  • Jordan, M. I., Ghahramani, Z., Jaakkola, T. and Saul, L. K. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37 183–233.
  • Kemp, C., Griffiths, T. H. and Tenenbaum, J. B. (2004). Discovering latent classes in relational data. Technical report, MIT Computer Science and Artificial Intelligence Laboratory.
  • Leisink, M. A. R. and Kappen, H. J. (2001). A tighter bound for graphical models. Neural Comput. 13 2149–2171.
  • Lorrain, F. and White, H. C. (1971). Structural equivalence of individuals in social networks. J. Math. Soc. 1 49–80.
  • Mariadassou, M. (2006). Estimation paramétrique dans le modèle ERMG. Master’s thesis, Univ. Paris XI/Ecole Nationale Supèrieure.
  • Mariadassou, M., Robin, S. and Vacher, C. (2010). Supplement to “Uncovering latent structure in valued graphs: A variational approach.” DOI: 10.1214/07-AOAS361SUPP.
  • McGrory, C. A. and Titterington, D. M. (2007). Variational approximations in Bayesian model selection for finite mixture distributions. Comput. Statist. Data Anal. 51 5352–5367.
  • McLahan, G. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.
  • Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Phys. Rev. E 69 066133.
  • Newman, M. E. J., Watts, D. J. and Strogatz, S. H. (2002). Random graph models of social networks. Proc. Natl. Acad. Sci. USA 99 2566–2572.
  • Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic block-structures. J. Amer. Statist. Assoc. 96 1077–1087.
  • Paradis, E., Claude, J. and Strimmer, K. (2004). Ape: Analyses of phylogenetics and evolution in R language. Bioinformatics 20 289–290.
  • Pattison, P. E. and Robins, G. L. (2007). Probabilistic network theory. In Handbook of Probability Theory with Applications. Sage, Thousand Oaks, CA.
  • Picard, F., Daudin, J.-J., Miele, V., Mariadassou, M. and Robin, S. (2007). A novel framework for random graph models with heterogeneous connectivity structure. Submitted.
  • Poulin, R. (2005). Relative infection levels and taxonomic distances among the host species used by a parasite: Insights into parasite specialization. Parasitology 130 109–115.
  • Rezende, E. L., Lavabre, J. E., Guimaraes, P. R., Jr., Jordano, P. and Bascompte, J. (2007). Non-random coextinctions in phylogenetically structured mutualistic networks. Nature 448 925–928.
  • Ricklefs, R. E. and Miller, G. L. (2000). Community ecology. In Ecology, 4th ed. Freeman, San Francisco, CA.
  • Rossberg, A. G., Ishii, R., Amemiya, T. and Itoh, K. (2006). Food webs: Experts consuming families of experts. J. Theoret. Biol. 241 552–563.
  • Tykiakanis, J. M., Tscharntke, T. and Lewis, O. T. (2007). Habitat modification alters the structure of tropical host-parasitoid food webs. Nature 51 202–205.
  • Vacher, C., Piou, D. and Desprez-Loustau, M.-L. (2008). Architecture of an antagonistic tree/fungus network: The asymmetric influence of past evolutionary history. PLoS ONE 3 1740.
  • von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555–586.
  • Webb, C. O. and Donoghue, M. J. (2005). Phylomatic: Tree assembly for applied phylogenetics. Mol. Ecol. Notes 5 181–183.
  • Winn, J., Bishop, C. M. and Jaakkola, T. (2005). Variational message passing. J. Mach. Learn. Res. 6 661–694.
  • Xing, E., Jordan, M. and Russell, S. (2003). A generalized mean field algorithm for variational inference in exponential families. In Proceedings of the 19th Annual Conference on Uncertainty in Artificial Intelligence (UAI-03) 583–591. Morgan Kaufmann, San Francisco, CA.
  • Yedidia, J. S., Freeman, W. T. and Weiss, Y. (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Inform. Theory 15 2282–2312.

Supplemental materials

  • Supplementary material: Interaction network between tree and fungal species. This file contains: •The adjacency matrix of interactions between tree and fungal species. •The list of the tree species. •The list of the fungal species. •The matrix of genetic distances between tree species. •The matrix of geographical distances between tree species. •The matrix of taxonomic distances between fungal species. •The matrix of nutritional type of the fungal species.