The Annals of Statistics

Asymptotics in directed exponential random graph models with an increasing bi-degree sequence

Ting Yan, Chenlei Leng, and Ji Zhu

Full-text: Open access

Abstract

Although asymptotic analyses of undirected network models based on degree sequences have started to appear in recent literature, it remains an open problem to study statistical properties of directed network models. In this paper, we provide for the first time a rigorous analysis of directed exponential random graph models using the in-degrees and out-degrees as sufficient statistics with binary as well as continuous weighted edges. We establish the uniform consistency and the asymptotic normality for the maximum likelihood estimate, when the number of parameters grows and only one realized observation of the graph is available. One key technique in the proofs is to approximate the inverse of the Fisher information matrix using a simple matrix with high accuracy. Numerical studies confirm our theoretical findings.

Article information

Source
Ann. Statist., Volume 44, Number 1 (2016), 31-57.

Dates
Received: December 2014
Revised: May 2015
First available in Project Euclid: 10 December 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1449755956

Digital Object Identifier
doi:10.1214/15-AOS1343

Mathematical Reviews number (MathSciNet)
MR3449761

Zentralblatt MATH identifier
1331.62110

Subjects
Primary: 62F10: Point estimation 62F12: Asymptotic properties of estimators
Secondary: 62B05: Sufficient statistics and fields 62E20: Asymptotic distribution theory 05C80: Random graphs [See also 60B20]

Keywords
Bi-degree sequence central limit theorem consistency directed exponential random graph models Fisher information matrix maximum likelihood estimation

Citation

Yan, Ting; Leng, Chenlei; Zhu, Ji. Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann. Statist. 44 (2016), no. 1, 31--57. doi:10.1214/15-AOS1343. https://projecteuclid.org/euclid.aos/1449755956


Export citation

References

  • [1] Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 US Election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery 36–43. ACM, New York.
  • [2] Akoglu, L., Vaz de Melo, P. O. S. and Faloutsos, C. (2012). Quantifying reciprocity in large weighted communication networks. Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science 7302 85–96.
  • [3] Bader, G. D. and Hogue, C. W. V. (2003). An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4 2–27.
  • [4] Barndorff-Nielsen, O. (1973). Exponential families and conditioning. Ph.D. thesis, Univ. of Copenhagen.
  • [5] Berk, R. H. (1972). Consistency and asymptotic normality of MLE’s for exponential models. Ann. Mat. Statist. 43 193–204.
  • [6] Bickel, P. J., Chen, A. and Levina, E. (2011). The method of moments and degree distributions for network models. Ann. Statist. 39 2280–2301.
  • [7] Bolla, M. and Elbanna, A. (2014). Estimating parameters of a multipartite loglinear graph model via the EM algorithm. Preprint. Available at arXiv:1411.7934.
  • [8] Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324–345.
  • [9] Chatterjee, S. and Diaconis, P. (2013). Estimating and understanding exponential random graph models. Ann. Statist. 41 2428–2461.
  • [10] Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435.
  • [11] Chen, N. and Olvera-Cravioto, M. (2013). Directed random graphs with given degree distributions. Stoch. Syst. 3 1–40.
  • [12] Diesner, J. and Carley, K. M. (2005). Exploration of communication networks from the Enron email corpus. In Proceedings of Workshop on Link Analysis, Counterterrorism and Security, SIAM International Conference on Data Mining 3–14. SIAM, Philadelphia, PA.
  • [13] Erdős, P. L., Miklós, I. and Toroczkai, Z. (2010). A simple Havel–Hakimi type algorithm to realize graphical degree sequences of directed graphs. Electron. J. Combin. 17 Research Paper 66, 10.
  • [14] Fienberg, S. E. (2012). A brief history of statistical models for network analysis and open challenges. J. Comput. Graph. Statist. 21 825–839.
  • [15] Fienberg, S. E., Petrović, S. and Rinaldo, A. (2011). Algebraic statistics for $p_{1}$ random graph models: Markov bases and their uses. In Looking Back. Lect. Notes Stat. Proc. (N. J. Dorans and S. Sinharay, eds.) 202 21–38. Springer, New York.
  • [16] Fienberg, S. E. and Rinaldo, A. (2012). Maximum likelihood estimation in log-linear models. Ann. Statist. 40 996–1023.
  • [17] Fienberg, S. E. and Wasserman, S. (1981). An exponential family of probability distributions for directed graphs: Comment. J. Amer. Statist. Assoc. 76 54–57.
  • [18] Fienberg, S. E. and Wasserman, S. S. (1981). Categorical data analysis of single sociometric relations. Sociol. Method. 1981 156–192.
  • [19] Fischer, G. H. (1981). On the existence and uniqueness of maximum-likelihood estimates in the Rasch model. Psychometrika 46 59–77.
  • [20] Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821–7826 (electronic).
  • [21] Haberman, S. J. (1977). Maximum likelihood estimates in exponential response models. Ann. Statist. 5 815–841.
  • [22] Haberman, S. J. (1981). An exponential family of probability distributions for directed graphs: Comment. J. Amer. Statist. Assoc. 76 60–61.
  • [23] Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks, Working Paper 39. Technical report, Center for Statistics and the Social Sciences, Univ. Washington, Seattle, WA.
  • [24] Helleringer, S. and Kohler, H.-P. (2007). Sexual network structure and the spread of HIV in Africa: Evidence from Likoma Island, Malawi. AIDS 21 2323–2332.
  • [25] Hillar, C. and Wibisono, A. (2013). Maximum entropy distributions on graphs. Preprint. Available at arXiv:1301.3321.
  • [26] Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65.
  • [27] Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks. J. Comput. Graph. Statist. 15 565–583.
  • [28] Kantorovič, L. V. (1948). On Newton’s method for functional equations. Dokl. Akad. Nauk SSSR 59 1237–1240.
  • [29] Kim, H., Del Genio, C. I., Bassler, K. E. and Toroczkai, Z. (2012). Constructing and sampling directed graphs with given degree sequences. New J. Phys. 14 023012.
  • [30] Kossinets, G. and Watts, D. J. (2006). Empirical analysis of an evolving social network. Science 311 88–90.
  • [31] Loève, M. (1977). Probability Theory. I, 4th ed. Springer, New York.
  • [32] Nepusz, T., Yu, H. and Paccanaro, A. (2012). Detecting overlapping protein complexes in protein–protein interaction networks. Nat. Methods 18 471–472.
  • [33] Newman, M. E. J. (2002). Spread of epidemic disease on networks. Phys. Rev. E (3) 66 016128, 11.
  • [34] Olhede, S. C. and Wolfe, P. J. (2012). Degree-based network models. Preprint. Available at arXiv:1211.6537.
  • [35] Ortega, J. M. (1968). The Newton–Kantorovich theorem. Amer. Math. Monthly 75 658–660.
  • [36] Ortega, J. M. and Rheinboldt, W. C. (1970). Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York.
  • [37] Petrović, S., Rinaldo, A. and Fienberg, S. E. (2010). Algebraic statistics for a directed random graph model with reciprocation. In Algebraic Methods in Statistics and Probability II. Contemp. Math. 516 (M. A. G. Vianaand and H. P. Wynn, eds.) 261–283. Amer. Math. Soc., Providence, RI.
  • [38] Polyak, B. T. (2004). Newton–Kantorovich method and its global convergence. J. Math. Sci. 133 1513–1523.
  • [39] Rinaldo, A., Petrović, S. and Fienberg, S. E. (2013). Maximum likelihood estimation in the $\beta$-model. Ann. Statist. 41 1085–1110.
  • [40] Robins, G. and Pattison, P. (2007). An introduction to exponential random graph ($p^{*}$) models for social networks. Soc. Netw. 29 173–191.
  • [41] Robins, G., Pattison, P. and Wang, P. (2009). Closure, connectivity and degree distributions: Exponential random graph ($p^{*}$) models for directed social networks. Soc. Netw. 31 105–117.
  • [42] Robins, G. L., Snijders, T. A. B., Wang, P., Handcock, M. and Pattison, P. (2007). Recent developments in exponential random graph ($p^{*}$) models for social networks. Soc. Netw. 29 192–215.
  • [43] Salathéa, M., Kazandjievab, M., Leeb, J. W., Levisb, P., Marcus, Feldman, M. W. and Jones, J. H. (2010). A high-resolution human contact network for infectious disease transmission. Proc. Natl. Acad. Sci. USA 107 22020–22025.
  • [44] Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. J. Amer. Statist. Assoc. 106 1361–1370.
  • [45] Shalizi, C. R. and Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. Ann. Statist. 41 508–535.
  • [46] Simons, G. and Yao, Y.-C. (1999). Asymptotics when the number of parameters tends to infinity in the Bradley–Terry model for paired comparisons. Ann. Statist. 27 1041–1060.
  • [47] Tapia, R. A. (1971). Classroom Notes: The Kantorovich theorem for Newton’s method. Amer. Math. Monthly 78 389–392.
  • [48] von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S. G., Fields, S. and Bork, P. (2002). Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417 399–403.
  • [49] Wainwright, M. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Faund. Trends Mach. Learn. 1 1–305.
  • [50] Wu, N. (1997). The Maximum Entropy Method. Springer, Berlin.
  • [51] Yan, T., Leng, C. and Zhu, J. (2015). Supplement to “Asymptotics in directed exponential random graph models with an increasing bi-degree sequence.” DOI:10.1214/15-AOS1343SUPP.
  • [52] Yan, T. and Leng, C. (2015). A simulation study of the $p_{1}$ model for directed random graphs. Stat. Interface 8 255–266.
  • [53] Yan, T. and Xu, J. (2013). A central limit theorem in the $\beta$-model for undirected random graphs with a diverging number of vertices. Biometrika 100 519–524.
  • [54] Yan, T., Zhao, Y. and Qin, H. (2015). Asymptotic normality in the maximum entropy models on graphs with an increasing number of parameters. J. Multivariate Anal. 133 61–76.
  • [55] Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.

Supplemental materials

  • Supplement to “Asymptotics in directed exponential random graph models with an increasing bi-degree sequence.”. The supplemental material contains proofs for the lemmas in Section 2.2, the theorems and lemmas in Sections 2.3 and 2.4, Proposition 1 and Theorem 7.