The Annals of Applied Statistics

Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US

Pavel N. Krivitsky and Martina Morris

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Egocentric network sampling observes the network of interest from the point of view of a set of sampled actors, who provide information about themselves and anonymized information on their network neighbors. In survey research, this is often the most practical, and sometimes the only, way to observe certain classes of networks, with the sexual networks that underlie HIV transmission being the archetypal case. Although methods exist for recovering some descriptive network features, there is no rigorous and practical statistical foundation for estimation and inference for network models from such data. We identify a subclass of exponential-family random graph models (ERGMs) amenable to being estimated from egocentrically sampled network data, and apply pseudo-maximum-likelihood estimation to do so and to rigorously quantify the uncertainty of the estimates. For ERGMs parametrized to be invariant to network size, we describe a computationally tractable approach to this problem. We use this methodology to help understand persistent racial disparities in HIV prevalence in the US. We also discuss some extensions, including how our framework may be applied to triadic effects when data about ties among the respondent’s neighbors are also collected.

Article information

Ann. Appl. Stat. Volume 11, Number 1 (2017), 427-455.

Received: March 2015
Revised: December 2016
First available in Project Euclid: 8 April 2017

Permanent link to this document

Digital Object Identifier

Zentralblatt MATH identifier

Social network ERGM random graph egocentrically sampled data pseudo maximum likelihood pseudolikelihood


Krivitsky, Pavel N.; Morris, Martina. Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US. Ann. Appl. Stat. 11 (2017), no. 1, 427--455. doi:10.1214/16-AOAS1010.

Export citation


  • Admiraal, R. (2009). Dynamic network models based on revealed preference for observed relations and egocentric data. Ph.D. thesis, Univ. Washington, Seattle, WA.
  • Airoldi, E., Blei, D., Fienberg, S., Goldenberg, A., Xing, E. and Zheng, A. (2008). Statistical Network Analysis: Models, Issues, and New Directions. Springer, Berlin.
  • Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. Int. Stat. Rev. 51 279–292.
  • Brown, L. D. (1986). Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Institute of Mathematical Statistics Lecture Notes—Monograph Series 9. IMS, Hayward, CA.
  • Burt, R. S. (1984). Network items and the general social survey. Soc. Netw. 6 293–339.
  • Butts, C. T. (2008). Social network analysis with sna. J. Stat. Softw. 24 1–51.
  • Dhanjal, C., Clémençon, S., Arazoza, H. D., Rossi, F. and Tran, V. C. (2011). The Evolution of the Cuban HIV/AIDS Network. Preprint. Available at arXiv:1109.2499.
  • Fellows, I. and Handcock, M. S. (2012). Exponential-family random network models. Preprint. Available at arXiv:1208.0121.
  • Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika 80 27–38.
  • Frank, O. and Strauss, D. (1986). Markov graphs. J. Amer. Statist. Assoc. 81 832–842.
  • Fuller, W. A. (2011). Sampling Statistics. Wiley Series in Survey Methodology 560. Wiley.
  • Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood for dependent data (with discussion). J. Roy. Statist. Soc. Ser. B 54 657–699.
  • Gjoka, M., Smith, E. and Butts, C. (2014). Estimating clique composition and size distributions from sampled network data. In Sixth IEEE International Workshop on Network Science for Communication Networks.
  • Gjoka, M., Smith, E. and Butts, C. T. (2015). Estimating subgraph frequencies with or without attributes from egocentrically sampled data. Preprint. Available at arXiv:1510.08119.
  • Goodreau, S. M., Kitts, J. A. and Morris, M. (2009). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography 46 103–125.
  • Goodreau, S., Cassels, S., Kasprzyk, D., Montaño, D., Greek, A. and Morris, M. (2012). Concurrent partnerships, acute infection and HIV epidemic dynamics among young adults in Zimbabwe. AIDS Behav. 16 1–11.
  • Gupta, S., Anderson, R. M. and May, R. M. (1989). Networks of sexual contacts: Implications for the pattern of spread of HIV. AIDS 3 807–818.
  • Hájek, J. (1971). Comment on an essay on the logical foundations of survey sampling by Basu, Debabrata. In Foundations of Statistical Inference: Proceedings of the Symposium on the Foundations of Statistical Inference (V. P. Godambe and D. A. Sprott, eds.). René Descartes Foundation, Holt McDougal, Dept. Statistics, Univ. Waterloo, Ont., Canada, March 31 to April 9, 1970.
  • Hallfors, D. D., Iritani, B. J., Miller, W. C. and Bauer, D. J. (2007). Sexual and drug behavior patterns and HIV and STD racial disparities: The need for new directions. Am. J. Public Health 97 125–132.
  • Hamilton, D. T. and Morris, M. (2010). Consistency of self-reported sexual behavior in surveys. Arch. Sex. Behav. 39 842–860.
  • Handcock, M. S. and Gile, K. J. (2010). Modeling social networks from sampled data. Ann. Appl. Stat. 4 5–25.
  • Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Krivitsky, P. N. and Morris, M. (2014). ergm: Fit, simulate and diagnose exponential-family models for networks. The Statnet Project ( R package version 3.1.2.
  • Hummel, R. M., Hunter, D. R. and Handcock, M. S. (2012). Improving simulation-based algorithms for fitting ERGMs. J. Comput. Graph. Statist. 21 920–939.
  • Hunter, D. R., Goodreau, S. M. and Handcock, M. S. (2008a). Goodness of fit for social network models. J. Amer. Statist. Assoc. 103 248–258.
  • Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks. J. Comput. Graph. Statist. 15 565–583.
  • Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M. and Morris, M. (2008b). ergm: A package to fit, simulate and diagnose exponential-family models for networks. J. Stat. Softw. 24 1–29.
  • Illenberger, J. and Flötterö, G. (2012). Estimating network properties from snowball sampled data. Soc. Netw. 34 701–711.
  • Koskinen, J. H., Robins, G. L. and Pattison, P. E. (2010). Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation. Stat. Methodol. 7 366–384.
  • Krivitsky, P. N. (2012). Modeling of Dynamic Networks based on Egocentric Data with Durational Information. Technical Report 2012-01, Dept. Statistics, Pennsylvania State Univ., State College, PA.
  • Krivitsky, P. N. and Handcock, M. S. (2014). A separable model for dynamic networks. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 29–46.
  • Krivitsky, P. N., Handcock, M. S. and Morris, M. (2011). Adjusting for network size and composition effects in exponential-family random graph models. Stat. Methodol. 8 319–339.
  • Krivitsky, P. N. and Kolaczyk, E. D. (2015). On the question of effective sample size in network modeling: An asymptotic inquiry. Statist. Sci. 30 184–198.
  • Krivitsky, P. N. and Morris, M. (2017). Supplement to “Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US.” DOI:10.1214/16-AOAS1010SUPP.
  • Laumann, E. O., Gagnon, J. H., Michael, R. T. and Michaels, S. (1992). National health and social life survey. Univ. Chicago and National Opinion Research Center [producer], Chicago, IL, USA. 1995. Inter-university Consortium for Political and Social Research [distributor], Ann Arbor, MI, USA. 2008-04-17. DOI:10.3886/ICPSR06647.
  • Laumann, E. O., Gagnon, J. H., Michael, R. T. and Michaels, S. (1994). The Social Organization of Sexuality. Univ. Chicago Press, Chicago, IL.
  • Marsden, P. V. (1981). Models and methods for characterizing the structural parameters of groups. Soc. Netw. 3 1–27.
  • Marsden, P. V. (1987). Core discussion networks of Americans. Am. Sociol. Rev. 52 122–131.
  • MEASURE DHS (2000–2014). Demographic and Health Surveys. ICF International, Fairfax, VA.
  • Morris, M. (1991). A log-linear modeling framework for selective mixing. Math. Biosci. 107 349–377.
  • Morris, M. (1993a). Epidemiology and social networks: Modeling structured diffusion. Sociol. Methods Res. 22 99–126.
  • Morris, M. (1993b). Telling tails explain the discrepancy in sexual partner reports. Nature 365 437–440.
  • Morris, M. and Kretzschmar, M. (1997). Concurrent partnerships and the spread of HIV. AIDS 11 641–648.
  • Morris, M., Handcock, M. S., Miller, W. C., Ford, C. A., Schmitz, J. L., Hobbs, M. M., Cohen, M. S., Harris, K. M. and Udry, J. R. (2006). Prevalence of HIV infection among young adults in the U.S.: Results from the ADD health study. Am. J. Public Health 96 1091–1097.
  • Morris, M., Kurth, A. E., Hamilton, D. T., Moody, J. and Wakefield, S. (2009). Concurrent partnerships and HIV prevalence disparities by race: Linking science and public health practice. Am. J. Public Health 99 1023–1031.
  • National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention (NCHHSTP) (2012). Estimated HIV incidence in the United States, 2007–2010. HIV surveillance supplemental report 17(4), Centers for Disease Control and Prevention. Online. Available at Retrieved January 8, 2015.
  • National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention (NCHHSTP) (2013). Diagnoses of HIV infection among adults aged 50 years and older in the United States and dependent areas, 2007–2010. HIV surveillaince supplemental report 18(3), Centers for Disease Control and Prevention. Online. Available at Retrieved January 8, 2015.
  • National Communicable Disease Center (NCDC) (1967). Morbidity and mortality weekly report: Reported incidence of notifiable diseases in the United States, 1966. Annual Supplement 15(53), U.S. Dept. Health, Education, and Welfare, Atlanta, GA. Online. Available at Retrieved January 8, 2015.
  • National Survey of Family Growth Staff (2002, 2006–2011). National Survey of Family Growth (NSFG). Division of Vital Statistics, National Center for Health Statistics. Available at
  • Pattison, P. E., Robins, G. L., Snijders, T. A. B. and Wang, P. (2013). Conditional estimation of exponential random graph models from snowball sampling designs. J. Math. Psych. 57 284–296.
  • Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. Int. Stat. Rev. 61 317–337.
  • Population Estimates Program (2001). Resident population estimates of the United States by age and sex: April 1, 1990 to July 1, 1999, with short-term projection to November 1, 2000. Population Division, U.S. Census Bureau. Online. Available at Retrieved June 9, 2009.
  • Putnam, R. D. (2000). Bowling Alone: The Collapse and Revival of American Community. Simon & Schuster, New York.
  • R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Salganik, M. J. and Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent-driven sampling. Sociol. Method. 34 193–239.
  • Shalizi, C. R. and Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. Ann. Statist. 41 508–535.
  • Smith, J. A. (2012). Macrostructure from microstructure: Generating whole systems from ego networks. Sociol. Method. 42 155–205.
  • Snijders, T. A. (2010). Conditional marginalization for exponential random graph models. J. Math. Sociol. 34 239–252.
  • Strauss, D. and Ikeda, M. (1990). Pseudolikelihood estimation for social networks. J. Amer. Statist. Assoc. 85 204–212.
  • Tanfer, K. (1991). National survey of women. In AIDS/STD Data Archive (E. A. McKean, K. L. Muller and E. L. Lang, eds.) 17–19. Sociometrics Corporation, Los Altos, CA.
  • Thompson, S. K. and Frank, O. (2000). Model-based estimation with link-tracing sampling designs. Surv. Methodol. 26 87–98.
  • Tomas, A. and Gile, K. J. (2011). The effect of differential recruitment, non-response and non-recruitment on estimators for respondent-driven sampling. Electron. J. Stat. 5 899–934.
  • Trotter, R. T. II, Baldwin, J. A. and Bowen, A. M. (1995). Network structure and proxy network measures of HIV, drug and incarceration risks for active drug users. Connections 18 88–103.
  • Udry, J. R. (2003). The National Longitudinal Study of Adolescent Health (Add Health), Waves I & II, 1994–1996; Wave III, 2001–2002. Carolina Population Center, Univ. North Carolina at Chapel Hill. Online. Available at Retrieved January 8, 2015.
  • UNAIDS (2014). HIV Estimates with Uncertainty Bounds 1990–2013. Tech. Rep., United Nations.
  • van Duijn, M. A. J., van Busschbach, J. T. and Snijders, T. A. B. (1999). Multilevel analysis of personal networks as dependent variables. Soc. Netw. 21 187–210.
  • Volz, E. and Heckathorn, D. D. (2008). Probability based estimation theory for respondent driven sampling. J. Off. Stat. 24 79–97.
  • Wasserman, S. S. and Pattison, P. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and $p^{*}$. Psychometrika 61 401–425.

Supplemental materials

  • Appendices A–C. Additional derivations and results referenced in Sections 5, 6, 7, and 8.