## The Annals of Applied Statistics

- Ann. Appl. Stat.
- Volume 11, Number 1 (2017), 427-455.

### Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US

Pavel N. Krivitsky and Martina Morris

**Full-text: Access denied (no subscription detected) **

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

#### Abstract

Egocentric network sampling observes the network of interest from the point of view of a set of sampled actors, who provide information about themselves and anonymized information on their network neighbors. In survey research, this is often the most practical, and sometimes the only, way to observe certain classes of networks, with the sexual networks that underlie HIV transmission being the archetypal case. Although methods exist for recovering some descriptive network features, there is no rigorous and practical statistical foundation for estimation and inference for network models from such data. We identify a subclass of exponential-family random graph models (ERGMs) amenable to being estimated from egocentrically sampled network data, and apply pseudo-maximum-likelihood estimation to do so and to rigorously quantify the uncertainty of the estimates. For ERGMs parametrized to be invariant to network size, we describe a computationally tractable approach to this problem. We use this methodology to help understand persistent racial disparities in HIV prevalence in the US. We also discuss some extensions, including how our framework may be applied to triadic effects when data about ties among the respondent’s neighbors are also collected.

#### Article information

**Source**

Ann. Appl. Stat. Volume 11, Number 1 (2017), 427-455.

**Dates**

Received: March 2015

Revised: December 2016

First available in Project Euclid: 8 April 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.aoas/1491616887

**Digital Object Identifier**

doi:10.1214/16-AOAS1010

**Zentralblatt MATH identifier**

1366.62225

**Keywords**

Social network ERGM random graph egocentrically sampled data pseudo maximum likelihood pseudolikelihood

#### Citation

Krivitsky, Pavel N.; Morris, Martina. Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US. Ann. Appl. Stat. 11 (2017), no. 1, 427--455. doi:10.1214/16-AOAS1010. https://projecteuclid.org/euclid.aoas/1491616887

#### References

- Admiraal, R. (2009). Dynamic network models based on revealed preference for observed relations and egocentric data. Ph.D. thesis, Univ. Washington, Seattle, WA.
- Airoldi, E., Blei, D., Fienberg, S., Goldenberg, A., Xing, E. and Zheng, A. (2008).
*Statistical Network Analysis*:*Models*,*Issues*,*and New Directions*. Springer, Berlin. - Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys.
*Int. Stat. Rev.***51**279–292. - Brown, L. D. (1986).
*Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Institute of Mathematical Statistics Lecture Notes—Monograph Series***9**. IMS, Hayward, CA.Zentralblatt MATH: 0685.62002 - Burt, R. S. (1984). Network items and the general social survey.
*Soc. Netw.***6**293–339. - Butts, C. T. (2008). Social network analysis with sna.
*J. Stat. Softw.***24**1–51. - Dhanjal, C., Clémençon, S., Arazoza, H. D., Rossi, F. and Tran, V. C. (2011). The Evolution of the Cuban HIV/AIDS Network. Preprint. Available at arXiv:1109.2499.arXiv: 1109.2499
- Fellows, I. and Handcock, M. S. (2012). Exponential-family random network models. Preprint. Available at arXiv:1208.0121.arXiv: 1208.0121
- Firth, D. (1993). Bias reduction of maximum likelihood estimates.
*Biometrika***80**27–38. - Frank, O. and Strauss, D. (1986). Markov graphs.
*J. Amer. Statist. Assoc.***81**832–842. - Fuller, W. A. (2011).
*Sampling Statistics. Wiley Series in Survey Methodology***560**. Wiley. - Geyer, C. J. and Thompson, E. A. (1992). Constrained Monte Carlo maximum likelihood for dependent data (with discussion).
*J. Roy. Statist. Soc. Ser. B***54**657–699. - Gjoka, M., Smith, E. and Butts, C. (2014). Estimating clique composition and size distributions from sampled network data. In
*Sixth IEEE International Workshop on Network Science for Communication Networks*. - Gjoka, M., Smith, E. and Butts, C. T. (2015). Estimating subgraph frequencies with or without attributes from egocentrically sampled data. Preprint. Available at arXiv:1510.08119.arXiv: 1510.08119
- Goodreau, S. M., Kitts, J. A. and Morris, M. (2009). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks.
*Demography***46**103–125. - Goodreau, S., Cassels, S., Kasprzyk, D., Montaño, D., Greek, A. and Morris, M. (2012). Concurrent partnerships, acute infection and HIV epidemic dynamics among young adults in Zimbabwe.
*AIDS Behav.***16**1–11. - Gupta, S., Anderson, R. M. and May, R. M. (1989). Networks of sexual contacts: Implications for the pattern of spread of HIV.
*AIDS***3**807–818. - Hájek, J. (1971). Comment on an essay on the logical foundations of survey sampling by Basu, Debabrata. In
*Foundations of Statistical Inference*:*Proceedings of the Symposium on the Foundations of Statistical Inference*(V. P. Godambe and D. A. Sprott, eds.). René Descartes Foundation, Holt McDougal, Dept. Statistics, Univ. Waterloo, Ont., Canada, March 31 to April 9, 1970. - Hallfors, D. D., Iritani, B. J., Miller, W. C. and Bauer, D. J. (2007). Sexual and drug behavior patterns and HIV and STD racial disparities: The need for new directions.
*Am. J. Public Health***97**125–132. - Hamilton, D. T. and Morris, M. (2010). Consistency of self-reported sexual behavior in surveys.
*Arch. Sex. Behav.***39**842–860. - Handcock, M. S. and Gile, K. J. (2010). Modeling social networks from sampled data.
*Ann. Appl. Stat.***4**5–25.Mathematical Reviews (MathSciNet): MR2758082

Zentralblatt MATH: 1189.62187

Digital Object Identifier: doi:10.1214/08-AOAS221

Project Euclid: euclid.aoas/1273584445 - Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Krivitsky, P. N. and Morris, M. (2014). ergm: Fit, simulate and diagnose exponential-family models for networks. The Statnet Project (http://www.statnet.org). R package version 3.1.2.
- Hummel, R. M., Hunter, D. R. and Handcock, M. S. (2012). Improving simulation-based algorithms for fitting ERGMs.
*J. Comput. Graph. Statist.***21**920–939.Mathematical Reviews (MathSciNet): MR3005804

Digital Object Identifier: doi:10.1080/10618600.2012.679224 - Hunter, D. R., Goodreau, S. M. and Handcock, M. S. (2008a). Goodness of fit for social network models.
*J. Amer. Statist. Assoc.***103**248–258.Zentralblatt MATH: 05564484 - Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks.
*J. Comput. Graph. Statist.***15**565–583. - Hunter, D. R., Handcock, M. S., Butts, C. T., Goodreau, S. M. and Morris, M. (2008b). ergm: A package to fit, simulate and diagnose exponential-family models for networks.
*J. Stat. Softw.***24**1–29. - Illenberger, J. and Flötterö, G. (2012). Estimating network properties from snowball sampled data.
*Soc. Netw.***34**701–711. - Koskinen, J. H., Robins, G. L. and Pattison, P. E. (2010). Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation.
*Stat. Methodol.***7**366–384. - Krivitsky, P. N. (2012). Modeling of Dynamic Networks based on Egocentric Data with Durational Information. Technical Report 2012-01, Dept. Statistics, Pennsylvania State Univ., State College, PA.
- Krivitsky, P. N. and Handcock, M. S. (2014). A separable model for dynamic networks.
*J. R. Stat. Soc. Ser. B. Stat. Methodol.***76**29–46. - Krivitsky, P. N., Handcock, M. S. and Morris, M. (2011). Adjusting for network size and composition effects in exponential-family random graph models.
*Stat. Methodol.***8**319–339. - Krivitsky, P. N. and Kolaczyk, E. D. (2015). On the question of effective sample size in network modeling: An asymptotic inquiry.
*Statist. Sci.***30**184–198.Mathematical Reviews (MathSciNet): MR3353102

Zentralblatt MATH: 1332.62036

Digital Object Identifier: doi:10.1214/14-STS502

Project Euclid: euclid.ss/1433341477 - Krivitsky, P. N. and Morris, M. (2017). Supplement to “Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US.” DOI:10.1214/16-AOAS1010SUPP.Zentralblatt MATH: 06725411
- Laumann, E. O., Gagnon, J. H., Michael, R. T. and Michaels, S. (1992). National health and social life survey. Univ. Chicago and National Opinion Research Center [producer], Chicago, IL, USA. 1995. Inter-university Consortium for Political and Social Research [distributor], Ann Arbor, MI, USA. 2008-04-17. DOI:10.3886/ICPSR06647.
- Laumann, E. O., Gagnon, J. H., Michael, R. T. and Michaels, S. (1994).
*The Social Organization of Sexuality*. Univ. Chicago Press, Chicago, IL. - Marsden, P. V. (1981). Models and methods for characterizing the structural parameters of groups.
*Soc. Netw.***3**1–27. - Marsden, P. V. (1987). Core discussion networks of Americans.
*Am. Sociol. Rev.***52**122–131. - MEASURE DHS (2000–2014).
*Demographic and Health Surveys*. ICF International, Fairfax, VA. - Morris, M. (1991). A log-linear modeling framework for selective mixing.
*Math. Biosci.***107**349–377. - Morris, M. (1993a). Epidemiology and social networks: Modeling structured diffusion.
*Sociol. Methods Res.***22**99–126. - Morris, M. (1993b). Telling tails explain the discrepancy in sexual partner reports.
*Nature***365**437–440. - Morris, M. and Kretzschmar, M. (1997). Concurrent partnerships and the spread of HIV.
*AIDS***11**641–648. - Morris, M., Handcock, M. S., Miller, W. C., Ford, C. A., Schmitz, J. L., Hobbs, M. M., Cohen, M. S., Harris, K. M. and Udry, J. R. (2006). Prevalence of HIV infection among young adults in the U.S.: Results from the ADD health study.
*Am. J. Public Health***96**1091–1097. - Morris, M., Kurth, A. E., Hamilton, D. T., Moody, J. and Wakefield, S. (2009). Concurrent partnerships and HIV prevalence disparities by race: Linking science and public health practice.
*Am. J. Public Health***99**1023–1031. - National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention (NCHHSTP) (2012). Estimated HIV incidence in the United States, 2007–2010. HIV surveillance supplemental report 17(4), Centers for Disease Control and Prevention. Online. Available at https://www.cdc.gov/hiv/library/reports/hiv-surveillance.html. Retrieved January 8, 2015.
- National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention (NCHHSTP) (2013). Diagnoses of HIV infection among adults aged 50 years and older in the United States and dependent areas, 2007–2010. HIV surveillaince supplemental report 18(3), Centers for Disease Control and Prevention. Online. Available at https://www.cdc.gov/hiv/library/reports/hiv-surveillance.html. Retrieved January 8, 2015.
- National Communicable Disease Center (NCDC) (1967). Morbidity and mortality weekly report: Reported incidence of notifiable diseases in the United States, 1966. Annual Supplement 15(53), U.S. Dept. Health, Education, and Welfare, Atlanta, GA. Online. Available at https://stacks.cdc.gov/view/cdc/615. Retrieved January 8, 2015.
- National Survey of Family Growth Staff (2002, 2006–2011). National Survey of Family Growth (NSFG). Division of Vital Statistics, National Center for Health Statistics. Available at https://www.cdc.gov/nchs/nsfg/.
- Pattison, P. E., Robins, G. L., Snijders, T. A. B. and Wang, P. (2013). Conditional estimation of exponential random graph models from snowball sampling designs.
*J. Math. Psych.***57**284–296.Mathematical Reviews (MathSciNet): MR3137882

Zentralblatt MATH: 1281.62245

Digital Object Identifier: doi:10.1016/j.jmp.2013.05.004 - Pfeffermann, D. (1993). The role of sampling weights when modeling survey data.
*Int. Stat. Rev.***61**317–337. - Population Estimates Program (2001). Resident population estimates of the United States by age and sex: April 1, 1990 to July 1, 1999, with short-term projection to November 1, 2000. Population Division, U.S. Census Bureau. Online. Available at https://www.census.gov/population/estimates/nation/intfile3-1.txt. Retrieved June 9, 2009.
- Putnam, R. D. (2000).
*Bowling Alone*:*The Collapse and Revival of American Community*. Simon & Schuster, New York. - R Core Team (2013).
*R*:*A Language and Environment for Statistical Computing*. R Foundation for Statistical Computing, Vienna, Austria. - Salganik, M. J. and Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent-driven sampling.
*Sociol. Method.***34**193–239. - Shalizi, C. R. and Rinaldo, A. (2013). Consistency under sampling of exponential random graph models.
*Ann. Statist.***41**508–535. - Smith, J. A. (2012). Macrostructure from microstructure: Generating whole systems from ego networks.
*Sociol. Method.***42**155–205. - Snijders, T. A. (2010). Conditional marginalization for exponential random graph models.
*J. Math. Sociol.***34**239–252. - Strauss, D. and Ikeda, M. (1990). Pseudolikelihood estimation for social networks.
*J. Amer. Statist. Assoc.***85**204–212. - Tanfer, K. (1991). National survey of women. In
*AIDS/STD Data Archive*(E. A. McKean, K. L. Muller and E. L. Lang, eds.) 17–19. Sociometrics Corporation, Los Altos, CA. - Thompson, S. K. and Frank, O. (2000). Model-based estimation with link-tracing sampling designs.
*Surv. Methodol.***26**87–98. - Tomas, A. and Gile, K. J. (2011). The effect of differential recruitment, non-response and non-recruitment on estimators for respondent-driven sampling.
*Electron. J. Stat.***5**899–934. - Trotter, R. T. II, Baldwin, J. A. and Bowen, A. M. (1995). Network structure and proxy network measures of HIV, drug and incarceration risks for active drug users.
*Connections***18**88–103. - Udry, J. R. (2003). The National Longitudinal Study of Adolescent Health (Add Health), Waves I & II, 1994–1996; Wave III, 2001–2002. Carolina Population Center, Univ. North Carolina at Chapel Hill. Online. Available at http://www.cpc.unc.edu/projects/addhealth/design/wave3. Retrieved January 8, 2015.
- UNAIDS (2014). HIV Estimates with Uncertainty Bounds 1990–2013. Tech. Rep., United Nations.
- van Duijn, M. A. J., van Busschbach, J. T. and Snijders, T. A. B. (1999). Multilevel analysis of personal networks as dependent variables.
*Soc. Netw.***21**187–210. - Volz, E. and Heckathorn, D. D. (2008). Probability based estimation theory for respondent driven sampling.
*J. Off. Stat.***24**79–97. - Wasserman, S. S. and Pattison, P. (1996). Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and $p^{*}$.
*Psychometrika***61**401–425.

#### Supplemental materials

- Appendices A–C. Additional derivations and results referenced in Sections 5, 6, 7, and 8.Digital Object Identifier: doi:10.1214/16-AOAS1010SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.

### More like this

- Modeling social networks from sampled
data

Handcock, Mark S. and Gile, Krista J., The Annals of Applied Statistics, 2010 - Probabilistic projections of HIV prevalence
using Bayesian melding

Alkema, Leontine, Raftery, Adrian E., and Clark, Samuel J., The Annals of Applied Statistics, 2007 - Conflict Diagnostics in Directed Acyclic Graphs, with Applications in Bayesian Evidence Synthesis

Presanis, Anne M., Ohlssen, David, Spiegelhalter, David J., and De Angelis, Daniela, Statistical Science, 2013

- Modeling social networks from sampled
data

Handcock, Mark S. and Gile, Krista J., The Annals of Applied Statistics, 2010 - Probabilistic projections of HIV prevalence
using Bayesian melding

Alkema, Leontine, Raftery, Adrian E., and Clark, Samuel J., The Annals of Applied Statistics, 2007 - Conflict Diagnostics in Directed Acyclic Graphs, with Applications in Bayesian Evidence Synthesis

Presanis, Anne M., Ohlssen, David, Spiegelhalter, David J., and De Angelis, Daniela, Statistical Science, 2013 - The influence of education in reducing the HIV epidemic

Margevicius, Renee and Joshi, Hem, Involve: A Journal of Mathematics, 2013 - Exit polling and racial bloc voting: Combining
individual-level and R×C ecological data

Greiner, D. James and Quinn, Kevin M., The Annals of Applied Statistics, 2010 - Modeling of the HIV infection epidemic in the
Netherlands: A multi-parameter evidence synthesis approach

Conti, Stefano, Presanis, Anne M., van Veen, Maaike G., Xiridou, Maria, Donoghoe, Martin C., Rinder Stengaard, Annemarie, and De Angelis, Daniela, The Annals of Applied Statistics, 2011 - Estimating hidden population size using Respondent-Driven Sampling data

Handcock, Mark S., Gile, Krista J., and Mar, Corinne M., Electronic Journal of Statistics, 2014 - Bayesian inference for queueing networks and modeling of internet services

Sutton, Charles and Jordan, Michael I., The Annals of Applied Statistics, 2011 - Consistency under sampling of exponential random graph models

Shalizi, Cosma Rohilla and Rinaldo, Alessandro, The Annals of Statistics, 2013 - High-dimensional data: p > > n in mathematical statistics and bio-medical applications

Van De Geer, Sara A. and Van Houwelingen, Hans C., Bernoulli, 2004