## Statistical Science

### On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry

#### Abstract

The modeling and analysis of networks and network data has seen an explosion of interest in recent years and represents an exciting direction for potential growth in statistics. Despite the already substantial amount of work done in this area to date by researchers from various disciplines, however, there remain many questions of a decidedly foundational nature—natural analogues of standard questions already posed and addressed in more classical areas of statistics—that have yet to even be posed, much less addressed. Here we raise and consider one such question in connection with network modeling. Specifically, we ask, “Given an observed network, what is the sample size?” Using simple, illustrative examples from the class of exponential random graph models, we show that the answer to this question can very much depend on basic properties of the networks expected under the model, as the number of vertices $n_{V}$ in the network grows. In particular, adopting the (asymptotic) scaling of the variance of the maximum likelihood parameter estimates as a notion of effective sample size ($n_{\mathrm{eff}}$), we show that when modeling the overall propensity to have ties and the propensity to reciprocate ties, whether the networks are sparse or not under the model (i.e., having a constant or an increasing number of ties per vertex, respectively) is sufficient to yield an order of magnitude difference in $n_{\mathrm{eff}}$, from $O(n_{V})$ to $O(n^{2}_{V})$. In addition, we report simulation study results that suggest similar properties for models for triadic (friend-of-a-friend) effects. We then explore some practical implications of this result, using both simulation and data on food-sharing from Lamalera, Indonesia.

#### Article information

Source
Statist. Sci. Volume 30, Number 2 (2015), 184-198.

Dates
First available in Project Euclid: 3 June 2015

https://projecteuclid.org/euclid.ss/1433341477

Digital Object Identifier
doi:10.1214/14-STS502

Mathematical Reviews number (MathSciNet)
MR3353102

Zentralblatt MATH identifier
1332.62036

#### Citation

Krivitsky, Pavel N.; Kolaczyk, Eric D. On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry. Statist. Sci. 30 (2015), no. 2, 184--198. doi:10.1214/14-STS502. https://projecteuclid.org/euclid.ss/1433341477

#### References

• Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2009). A survey of statistical network models. Found. Trends Mach. Learn. 2 129–233.
• Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley, Chichester.
• Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B Stat. Methodol. 36 192–236.
• Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
• Brown, L. D., Cai, T. T. and DasGupta, A. (2001). Interval estimation for a binomial proportion. Statist. Sci. 16 101–133.
• Brown, L. D., Cai, T. T. and DasGupta, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions. Ann. Statist. 30 160–201.
• Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6 1847–1899.
• Chatterjee, S. and Diaconis, P. (2013). Estimating and understanding exponential random graph models. Ann. Statist. 41 2428–2461.
• Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435.
• Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika 99 273–284.
• Chung, K. L. (2001). A Course in Probability Theory, 3rd ed. Academic Press, Inc., San Diego, CA.
• Frank, O. and Snijders, T. A. B. (1994). Estimating the size of hidden populations using snowball sampling. J. Official Statistics 10 53–67.
• Frank, O. and Strauss, D. (1986). Markov graphs. J. Amer. Statist. Assoc. 81 832–842.
• Haberman, S. J. (1981). An exponential family of probability distributions for directed graphs: Comment. J. Amer. Statist. Assoc. 76 60–61.
• Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Technical Report No. 39, Center for Statistics and the Social Sciences, Univ. Washington, Seattle, WA.
• Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Krivitsky, P. N. and Morris, M. (2014). ergm: Fit, simulate and diagnose exponential-family models for networks. The Statnet project. Available at http://www.statnet.org. R package version 3.1.2.
• Hanneke, S., Fu, W. and Xing, E. P. (2010). Discrete temporal models of social networks. Electron. J. Stat. 4 585–605.
• Holland, P. W. and Leinhardt, S. (1981b). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65.
• Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks. J. Comput. Graph. Statist. 15 565–583.
• Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M. and Morris, M. (2008). ergm: A package to fit, simulate and diagnose exponential-family models for networks. J. Stat. Softw. 24 1–29.
• Jackson, M. O. (2008). Social and Economic Networks. Princeton Univ. Press, Princeton, NJ.
• Kolaczyk, E. D. (2009). Statistical Analysis of Network Data. Methods and Models. Springer, New York.
• Krivitsky, P. N. and Handcock, M. S. (2014). A separable model for dynamic networks. J. R. Stat. Soc. Ser. B Stat. Methodol. 76 29–46.
• Krivitsky, P. N., Handcock, M. S. and Morris, M. (2011). Adjusting for network size and composition effects in exponential-family random graph models. Stat. Methodol. 8 319–339.
• Krivitsky, P. N. and Kolaczyk, E. D. (2014). Supplement to “On the question of effective sample size in network modeling: An asymptotic inquiry”. DOI:10.1214/14-STS502SUPP.
• Lavrakas, P. J. (2008). Encyclopedia of Survey Research Methods. SAGE Publications, Thousand Oaks, CA.
• Morris, M., Handcock, M. S. and Hunter, D. R. (2008). Specification of exponential-family random graph models: Terms and computational aspects. J. Stat. Softw. 24 1–24.
• Newman, M. E. J. (2010). Networks. An Introduction. Oxford Univ. Press, Oxford.
• Nolin, D. A. (2010). Food-sharing networks in Lamalera, Indonesia: Reciprocity, kinship, and distance. Hum. Nat. 21 243–268.
• Pu, W., Choi, J., Amir, E. and Espelage, D. L. (2013). Learning exponential random graph models. Unpublished manuscript. Available at https://www.ideals.illinois.edu/handle/2142/45098.
• R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
• Rinaldo, A., Petrović, S. and Fienberg, S. E. (2013). Maximum likelihood estimation in the $\beta$-model. Ann. Statist. 41 1085–1110.
• Robins, G., Snijders, T. A. B., Wang, P., Handcock, M. S. and Pattison, P. (2007). Recent developments in exponential random graph ($p^{*}$) models for social networks. Soc. Networks 29 192–215.
• Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
• Schweinberger, M. (2011). Instability, sensitivity, and degeneracy of discrete exponential families. J. Amer. Statist. Assoc. 106 1361–1370.
• Shalizi, C. R. and Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. Ann. Statist. 41 508–535.
• Snijders, T. A. B. and Borgatti, S. P. (1999). Non-parametric standard errors and tests for network statistics. Connections 22 161–170.
• Snijders, T. A. B., van de Bunt, G. G. and Steglich, C. E. G. (2010). Introduction to stochastic actor-based models for network dynamics. Soc. Networks 32 44–60.
• Thiébaux, H. J. and Zwiers, F. W. (1984). The interpretation and estimation of effective sample size. J. Climate Appl. Meteor. 23 800–811.
• van der Vaart, A. W. (2000). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
• Yang, Y., Remmers, E. F., Ogunwole, C. B., Kastner, D. L., Gregersen, P. K. and Li, W. (2011). Effective sample size: Quick estimation of the effect of related samples in genetic case–control association analyses. Comput. Biol. Chem. 35 40–49.

#### Supplemental materials

• Supplement to “On the Question of Effective Sample Size in Network Modeling: An Asymptotic Inquiry”. This document contains proofs of the results reported in the body of the article.