## The Annals of Statistics

### Subsampling bootstrap of count features of networks

#### Abstract

Analysis of stochastic models of networks is quite important in light of the huge influx of network data in social, information and bio sciences, but a proper statistical analysis of features of different stochastic models of networks is still underway. We propose bootstrap subsampling methods for finding empirical distribution of count features or “moments” (Bickel, Chen and Levina [Ann. Statist. 39 (2011) 2280–2301]) and smooth functions of these features for the networks. Using these methods, we cannot only estimate the variance of count features but also get good estimates of such feature counts, which are usually expensive to compute numerically in large networks. In our paper, we prove theoretical properties of the bootstrap estimates of variance of the count features as well as show their efficacy through simulation. We also use the method on some real network data for estimation of variance and expectation of some count features.

#### Article information

Source
Ann. Statist., Volume 43, Number 6 (2015), 2384-2411.

Dates
Revised: April 2015
First available in Project Euclid: 7 October 2015

https://projecteuclid.org/euclid.aos/1444222079

Digital Object Identifier
doi:10.1214/15-AOS1338

Mathematical Reviews number (MathSciNet)
MR3405598

Zentralblatt MATH identifier
1326.62067

#### Citation

Bhattacharyya, Sharmodeep; Bickel, Peter J. Subsampling bootstrap of count features of networks. Ann. Statist. 43 (2015), no. 6, 2384--2411. doi:10.1214/15-AOS1338. https://projecteuclid.org/euclid.aos/1444222079

#### References

• [1] Aldous, D. J. (1981). Representations for partially exchangeable arrays of random variables. J. Multivariate Anal. 11 581–598.
• [2] Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509–512.
• [3] Bearman, P. S., Moody, J. and Stovel, K. (2004). Chains of affection: The structure of adolescent romantic and sexual Networks1. American Journal of Sociology 110 44–91.
• [4] Bhattacharyya, S. and Bickel, P. J. (2015). Supplement to “Subsampling bootstrap of count features of networks.” DOI:10.1214/15-AOS1338SUPP.
• [5] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
• [6] Bickel, P. J., Chen, A. and Levina, E. (2011). The method of moments and degree distributions for network models. Ann. Statist. 39 2280–2301.
• [7] Bickel, P. J., Götze, F. and van Zwet, W. R. (1997). Resampling fewer than $n$ observations: Gains, losses, and remedies for losses. Statist. Sinica 7 1–31.
• [8] Bollobás, B., Janson, S. and Riordan, O. (2007). The phase transition in inhomogeneous random graphs. Random Structures Algorithms 31 3–122.
• [9] Chung, F. and Lu, L. (2002). Connected components in random graphs with given expected degree sequences. Ann. Comb. 6 125–145.
• [10] Davis, J. A. and Leinhardt, S. (1972). The structure of positive interpersonal relations in small groups. In Sociological Theories in Progress, Vol. 2 (J. Berger, M. Zelditch and B. Anderson, eds.) 218–251. Houghton-Mifflin, New York.
• [11] Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. Rend. Mat. Appl. (7) 28 33–61.
• [12] Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
• [13] Frank, O. (2005). Models and methods in social network analysis. In Network sampling and model fitting (P. J. Carrington, J. Scott and S. S. Wasserman, eds.) 31–56. Cambridge Univ. Press, Cambridge.
• [14] Handcock, M. S. and Gile, K. J. (2010). Modeling social networks from sampled data. Ann. Appl. Stat. 4 5–25.
• [15] Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.
• [16] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
• [17] Hoover, D. N. (1979). Relations on probability spaces and arrays of random variables. Institute for Advanced Study, Princeton, NJ.
• [18] Kallenberg, O. (2005). Probabilistic Symmetries and Invariance Principles. Springer, New York.
• [19] Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer, New York.
• [20] Leskovec, J., Kleinberg, J. and Faloutsos, C. (2005). Graphs over time: Densification laws, shrinking diameters and possible explanations. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining 177–187. ACM, New York.
• [21] Lovász, L. (2012). Large Networks and Graph Limits. American Mathematical Society Colloquium Publications 60. Amer. Math. Soc., Providence, RI.
• [22] Middendorf, M., Ziv, E. and Wiggins, C. H. (2005). Inferring network mechanisms: The Drosophila melanogaster protein interaction network. Proc. Natl. Acad. Sci. USA 102 3192–3197.
• [23] Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., Sheffer, M. and Alon, U. (2004). Superfamilies of evolved and designed networks. Science 303 1538–1542.
• [24] Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. and Alon, U. (2002). Network motifs: Simple building blocks of complex networks. Science 298 824–827.
• [25] Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077–1087.
• [26] Picard, F., Daudin, J.-J., Koskas, M., Schbath, S. and Robin, S. (2008). Assessing the exceptionality of network motifs. J. Comput. Biol. 15 1–20.
• [27] Przytycka, T. M. (2006). An important connection between network motifs and parsimony models. In Research in Computational Molecular Biology 321–335. Springer, Berlin.
• [28] Shen-Orr, S. S., Milo, R., Mangan, S. and Alon, U. (2002). Network motifs in the transcriptional regulation network of Escherichia coli. Nat. Genet. 31 64–68.
• [29] Thompson, S. K. (2012). Sampling, 3rd ed. Wiley, Hoboken, NJ.
• [30] Thompson, S. K. and Frank, O. (2000). Model-based estimation with link-tracing sampling designs. Survey Methodology 26 87–98.
• [31] Traud, A. L., Kelsic, E. D., Mucha, P. J. and Porter, M. A. (2011). Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53 526–543.
• [32] Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications 8. Cambridge University Press, Cambridge.
• [33] Wernicke, S. (2006). Efficient detection of network motifs. IEEE/ACM Transactions on Computational Biology and Bioinformatics 3 347–359.

#### Supplemental materials

• Supplement to “Subsampling bootstrap of count features of networks”. In the Supplement, we prove Theorems 1, 2, Proposition 6, Lemmas 7 and 8.