The Annals of Applied Statistics

Estimating within-household contact networks from egocentric data

Gail E. Potter, Mark S. Handcock, Ira M. Longini, Jr., and M. Elizabeth Halloran

Full-text: Open access


Acute respiratory diseases are transmitted over networks of social contacts. Large-scale simulation models are used to predict epidemic dynamics and evaluate the impact of various interventions, but the contact behavior in these models is based on simplistic and strong assumptions which are not informed by survey data. These assumptions are also used for estimating transmission measures such as the basic reproductive number and secondary attack rates. Development of methodology to infer contact networks from survey data could improve these models and estimation methods. We contribute to this area by developing a model of within-household social contacts and using it to analyze the Belgian POLYMOD data set, which contains detailed diaries of social contacts in a 24-hour period. We model dependency in contact behavior through a latent variable indicating which household members are at home. We estimate age-specific probabilities of being at home and age-specific probabilities of contact conditional on two members being at home. Our results differ from the standard random mixing assumption. In addition, we find that the probability that all members contact each other on a given day is fairly low: 0.49 for households with two 0–5 year olds and two 19–35 year olds, and 0.36 for households with two 12–18 year olds and two 36+ year olds. We find higher contact rates in households with 2–3 members, helping explain the higher influenza secondary attack rates found in households of this size.

Article information

Ann. Appl. Stat., Volume 5, Number 3 (2011), 1816-1838.

First available in Project Euclid: 13 October 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Graphs social networks contact networks latent variable epidemic model


Potter, Gail E.; Handcock, Mark S.; Longini, Jr., Ira M.; Halloran, M. Elizabeth. Estimating within-household contact networks from egocentric data. Ann. Appl. Stat. 5 (2011), no. 3, 1816--1838. doi:10.1214/11-AOAS474.

Export citation


  • Abdi, H. (2007). Bonferroni and Sidak corrections for multiple comparisons. In Encyclopedia of Measurement and Statistics (N. J. Salkind, ed.). Sage.
  • Anderson, R. M. and May, R. M. (1991). Infectious Diseases of Humans: Dynamics and Control. Oxford Univ. Press, London.
  • Bolker, B. M. (2008). Ecological Models and Data in R. Princeton Univ. Press, Princeton, NJ.
  • Britton, T. and O’Neill, P. D. (2002). Bayesian inference for stochastic epidemics in populations with random social structure. Scand. J. Stat. 29 375–390.
  • Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms. J. Inst. Math. Appl. 6 76–90.
  • Cauchemez, S., Donnelly, C. A., Reed, C., Ghani, A. C., Fraser, C., Kent, C. K., Finelli, L. and Ferguson, N. (2009). Household transmission of 2009 pandemic influenza A (H1N1) virus in the United States. New England Journal of Medicine 361 2619–2627.
  • Davoudi, B., Pourbohloul, B., Miller, J. C., Meza, R. and Meyers, L. A. (2009). Early real-time estimation of infectious disease reproduction number. Available at arXiv:0905.0728.
  • Demiris, N. and O’Neill, P. D. (2005). Bayesian inference for stochastic multitype epidemics in structured populations via random graphs. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 731–745.
  • Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Monographs on Statistics and Applied Probability 57. Chapman & Hall, New York.
  • Eubank, S., Guclu, H., Kumar, V. S. A., Marathe, M. V., Srinivasan, A., Toroczkai, Z. and Wang, N. (2004). Modelling disease outbreaks in realistic urban social networks. Nature 429 180–184.
  • Ferguson, N. M., Cummings, D. A. T., Fraser, C., Cajka, J. C., Cooley, P. C. and Burke, D. S. (2006). Strategies for mitigating an influenza pandemic. Nature 442 448–452.
  • Fletcher, R. (1970). A new approach to variable metric algorithms. Computer Journal 13 317–322.
  • Germann, T. C., Kadau, K., Longini, I. M., Jr. and Macken, C. A. (2006). Mitigation strategies for pandemic influenza in the United States. Proc. Natl. Acad. Sci. USA 103 5935–5940.
  • Goeyvaerts, N., Hens, N., Ogunjimi, B., Aerts, M., Shkedy, Z., Van Damme, P. and Beutels, P. (2009). Estimating infectious disease parameters from data on social contacts and serological status. J. R. Stat. Soc. Ser. C. Appl. Stat. 59 255–277.
  • Goldfarb, D. (1970). A family of variable-metric methods derived by variational means. Math. Comp. 24 23–26.
  • Halloran, M. E., Hayden, F., Yang, Y., Longini, I. M. and Monto, A. (2007). Antiviral effects on influenza viral transmission and pathogenicity: Observations from household-based trials. Amer. J. Epidemiol. 165 212–221.
  • Halloran, M. E., Ferguson, N. M., Eubank, S., Longini, I. M., Cummings, D. A. T., Lewis, B., Xu, S., Fraser, C., Vullikanti, A., Germann, T. C., Wagener, D., Beckman, R., Kadau, K., Barrett, C., Macken, C. A., Burke, D. S. and Cooley, P. (2008). Modeling targeted layered containment of an influenza pandemic in the United States. Proc. Natl. Acad. Sci. USA 105 4639–4644.
  • Handcock, M. S. and Gile, K. J. (2007). Modeling social networks with sampled or missing data. Center for Statistics in the Social Sciences, Univ. Washington. Available at
  • Handcock, M. S. and Gile, K. J. (2010). Modeling social networks with sampled data. Ann. Appl. Stat. 4 5–25.
  • Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M. and Morris, M. (2003). Statnet: Software tools for the statistical modeling of network data. Version 2.1. Seattle, WA. Project home page at Available at
  • Hens, N., Goeyvaerts, N., Aerts, M., Shkedy, Z., Damme, P. V. and Beutels, P. (2009). Mining social mixing patterns for infectious disease models based on a two-day population survey in Belgium. BMC Infect. Dis. 9 5.
  • Hethcote, H. W. and Yorke, J. A. (1984). Gonorrhea Transmission Dynamics and Control. Lecture Notes in Biomathematics 56. Springer, Berlin.
  • Horby, P., Thai, P. Q., Hens, N., Yen, N. T. T., Mai, L. Q., Thoang, D. D., Linh, N. M., Huong, N. T., er Alex, N., Edmunds, W. J., Duong, T. N., Fox, A. and Nguyen, N. T. (2011). Social contact patterns in Vietnam and implications for the control of infectious diseases. PLoS One 6.
  • Keeling, M. J. and Eames, K. T. D. (2005). Networks and epidemic models. Journal of the Royal Society Interface 2 295–307.
  • Koehly, L. M., Goodreau, S. M. and Morris, M. (2004). Exponential family models for sampled and census network data. Sociological Methodology 34 241–270.
  • Longini, I. M., Jr., Koopman, J. S., Haber, M. and Cotsonis, G. A. (1988). Statistical inference on risk-specific household and community transmission parameters for infectious diseases. Am. J. Epidemiol. 128 845–859.
  • Longini, I. M., Nizam, A., Xu, A., Ungchusak, K., Hanshaoworakul, W., Cummings, D. A. T. and Halloran, M. E. (2005). Containing pandemic influenza at the source. Science 309 1083–1087.
  • Miller, J. C. (2008). Spread of infectious disease through clustered populations. Journal of the Royal Society Interface 6 1121–1134.
  • Mossong, J., Hens, N., Jit, M., Beutels, P., Auranen, K., Mikolajczyk, R., Massari, M., Salmaso, S., Tomba, G. S., Wallinga, J., Heijne, J., Sadkowska-Todys, M., Rosinska, M. and Edmunds, W. J. (2008). Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Medicine 5 0381–0391.
  • Potter, G. E. and Handcock, M. S. (2010). A description of within-family resource exchange networks in a Malawian village. Demographic Research 23 117–152.
  • Potter, G. E., Handcock, M. S., Longini, I. M. J. and Halloran, M. E. (2011a). Supplement A to “Estimating within-household contact networks from egocentric data.” DOI:10.1214/11-AOAS474SUPPA.
  • Potter, G. E., Handcock, M. S., Longini, I. M. J. and Halloran, M. E. (2011b). Supplement B to “Estimating within-household contact networks from egocentric data.” DOI:10.1214/11-AOAS474SUPPB.
  • Potter, G. E., Handcock, M. S., Longini, I. M. J. and Halloran, M. E. (2011c). Supplement C to “Estimating within-household contact networks from egocentric data.” DOI:10.1214/11-AOAS474SUPPC.
  • R Development Core Team. (2010). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at
  • Shanno, D. F. (1970). Conditioning of quasi-Newton methods for function minimization. Math. Comp. 24 647–656.
  • Silvey, S. D. (1975). Statistical Inference. Chapman & Hall, London.
  • Wallinga, J., Teunis, P. and Kretzschmar, M. (2006). Using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents. Amer. J. Epidemiol. 164 936–944.
  • Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. Annals of Mathematical Statistics 9 60–62.
  • World Health (2010). Organization, Thousand Oaks, CA.
  • Yang, Y., Longini, I. M., Jr. and Halloran, M. E. (2007). A data-augmentation method for infectious disease incidence data from close contact groups. Comput. Statist. Data Anal. 51 6582–6595.

Supplemental materials

  • Supplementary material A: Contact network parameters estimated separately for the holiday period versus the nonholiday period, and for 2–3 member households versus 4+ member households. We present parameter estimates computed separately for respondents who reported during the Easter holiday period and during a nonholiday period. Next we report parameters estimated separately for households with 2–3 members and those with 4+ members.
  • Supplementary material B: Results from simulation study exploring weak identifiability. We present simulation results evaluating weak identifiability of our parameters in data sets with low within-household contact rates and low at-home probabilities.
  • Supplementary material C: R code used for estimation, bootstrapping, and simulation in “Estimating within-household contact networks from egocentric data”. This supplement includes R code used to perform estimation, bootstrap confidence intervals, and perform a simulation study assessing weak identifiability in households with low contact rates and low probabilities of being at home.