The Annals of Applied Statistics

Modeling social networks from sampled data

Mark S. Handcock and Krista J. Gile

Full-text: Open access

Abstract

Network models are widely used to represent relational information among interacting units and the structural implications of these relations. Recently, social network studies have focused a great deal of attention on random graph models of networks whose nodes represent individual social actors and whose edges represent a specified relationship between the actors.

Most inference for social network models assumes that the presence or absence of all possible links is observed, that the information is completely reliable, and that there are no measurement (e.g., recording) errors. This is clearly not true in practice, as much network data is collected though sample surveys. In addition even if a census of a population is attempted, individuals and links between individuals are missed (i.e., do not appear in the recorded data).

In this paper we develop the conceptual and computational theory for inference based on sampled network information. We first review forms of network sampling designs used in practice. We consider inference from the likelihood framework, and develop a typology of network data that reflects their treatment within this frame. We then develop inference for social network models based on information from adaptive network designs.

We motivate and illustrate these ideas by analyzing the effect of link-tracing sampling designs on a collaboration network.

Article information

Source
Ann. Appl. Stat., Volume 4, Number 1 (2010), 5-25.

Dates
First available in Project Euclid: 11 May 2010

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1273584445

Digital Object Identifier
doi:10.1214/08-AOAS221

Mathematical Reviews number (MathSciNet)
MR2758082

Zentralblatt MATH identifier
1189.62187

Keywords
Exponential family random graph model p* model Markov chain Monte Carlo design-based inference

Citation

Handcock, Mark S.; Gile, Krista J. Modeling social networks from sampled data. Ann. Appl. Stat. 4 (2010), no. 1, 5--25. doi:10.1214/08-AOAS221. https://projecteuclid.org/euclid.aoas/1273584445


Export citation

References

  • Barndorff-Nielsen, O. E. (1978). Information and Exponential Families in Statistical Theory. Wiley, New York.
  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192–236.
  • Corander, J., Dahmström, K. and Dahmström, P. (1998). Maximum likelihood estimation for Markov graphs. Research report, Dept. Statistics, Univ. Stockholm.
  • Corander, J., Dahmström, K. and Dahmström, P. (2002). Maximum likelihood estimation for exponential random graph models. In Contributions to Social Network Analysis, Information Theory, and Other Topics in Statistics; A Festschrift in Honour of Ove Frank (J. Hagberg, ed.) 1–17. Dept. Statistics, Univ. Stockholm.
  • Crouch, B., Wasserman, S. and Trachtenberg, F. (1998). Markov chain Monte Carlo maximum likelihood estimation for p* social network models. In The XVIII International Sunbelt Social Network Conference, Sitges, Spain.
  • Frank, O. (2005). Network Sampling and Model Fitting. In Models and Methods in Social Network Analysis (J. S. P. Carrington and S. S. Wasserman, eds.) 31–56. Cambridge Univ. Press, Cambridge.
  • Frank, O. and Strauss, D. (1986). Markov Graphs. J. Amer. Statist. Assoc. 81 832–842.
  • Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood calculations (with discussion). J. Roy. Statist. Soc. Ser. B 54 657–699.
  • Handcock, M. S. (2002). Degeneracy and inference for social network models. In The Sunbelt XXII International Social Network Conference, New Orleans, LA.
  • Handcock, M. S. (2003). Assessing degeneracy in statistical models of social networks. Working paper 39, Center for Statistics and the Social Sciences, Univ. Washington. Available at http://www.csss.washington.edu/Papers.
  • Handcock, M. S. and Gile, K. J. (2007). Modeling social networks with sampled or missing data. Working paper 75, Center for Statistics and the Social Sciences, Univ. Washington. Available at http://www.csss.washington.edu/Papers.
  • Handcock, M. S. and Gile, K. J. (2010). Supplement to “Modeling social networks from sampled data.” DOI: 10.1214/08-AOAS221SUPP.
  • Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M. and Morris, M. (2003). statnet: Software tools for the statistical modeling of network data statnet project http://statnet.org/, Seattle, WA. R package version 2.0. Available at http://CRAN.R-project.org/package=statnet.
  • Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks. J. Comput. Graph. Statist. 15 565–583.
  • Lazega, E. (2001). The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership. Oxford Univ. Press, Oxford.
  • Lehmann, E. L. (1983). Theory of Point Estimation. Wiley, New York, NY.
  • R Development Core Team (2007). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, Version 2.6.1. Available at http://www.R-project.org/.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592.
  • Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
  • Snijders, T. A. B. (1992). Estimation on the basis of snowball samples: How to weight. Bulletin Methodologie Sociologique 36 59–70.
  • Snijders, T. A. B. (2002). Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure 3 1–41.
  • Snijders, T. A. B., Pattison, P., Robins, G. L. and Handcock, M. S. (2006). New specifications for exponential random graph models. Sociological Methodology 36 99–153.
  • Strauss, D. and Ikeda, M. (1990). Pseudolikelihood estimation for social networks. J. Amer. Statist. Assoc. 85 204–212.
  • Stumpf, M. P. H., Wiuf, C. and May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. Natl. Acad. Sci. USA 102 4221–4224.
  • Thompson, S. K. and Collins, L. M. (2002). Adaptive sampling in research on risk-related behaviors. Drug and Alcohol Dependence 68 S57–S67.
  • Thompson, S. K. and Frank, O. (2000). Model-based estimation with link-tracing sampling designs. Survey Methodology 26 87–98.
  • Thompson, S. K. and Seber, G. A. F. (1996). Adaptive Sampling. Wiley, New York.
  • Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge Univ. Press.

Supplemental materials