Annals of Applied Statistics

Sex, lies and self-reported counts: Bayesian mixture models for heaping in longitudinal count data via birth–death processes

Forrest W. Crawford, Robert E. Weiss, and Marc A. Suchard

Full-text: Open access


Surveys often ask respondents to report nonnegative counts, but respondents may misremember or round to a nearby multiple of 5 or 10. This phenomenon is called heaping, and the error inherent in heaped self-reported numbers can bias estimation. Heaped data may be collected cross-sectionally or longitudinally and there may be covariates that complicate the inferential task. Heaping is a well-known issue in many survey settings, and inference for heaped data is an important statistical problem. We propose a novel reporting distribution whose underlying parameters are readily interpretable as rates of misremembering and rounding. The process accommodates a variety of heaping grids and allows for quasi-heaping to values nearly but not equal to heaping multiples. We present a Bayesian hierarchical model for longitudinal samples with covariates to infer both the unobserved true distribution of counts and the parameters that control the heaping process. Finally, we apply our methods to longitudinal self-reported counts of sex partners in a study of high-risk behavior in HIV-positive youth.

Article information

Ann. Appl. Stat., Volume 9, Number 2 (2015), 572-596.

Received: May 2014
Revised: February 2015
First available in Project Euclid: 20 July 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian hierarchical model coarse data continuous-time Markov chain heaping mixture model rounding


Crawford, Forrest W.; Weiss, Robert E.; Suchard, Marc A. Sex, lies and self-reported counts: Bayesian mixture models for heaping in longitudinal count data via birth–death processes. Ann. Appl. Stat. 9 (2015), no. 2, 572--596. doi:10.1214/15-AOAS809.

Export citation


  • Bailey, N. T. J. (1964). The Elements of Stochastic Processes with Applications to the Natural Sciences. Wiley, New York.
  • Bar, H. Y. and Lillard, D. R. (2012). Accounting for heaping in retrospectively reported event data—A mixture-model approach. Stat. Med. 31 3347–3365.
  • Brown, R. A., Burgess, E. S., Sales, S. D., Whiteley, J. A., Evans, D. M. and Miller, I. W. (1998). Reliability and validity of a smoking timeline follow-back interview. Psychology of Addictive Behaviors 12 101–112.
  • Browning, M., Crossley, T. F. and Weber, G. (2003). Asking consumption questions in general purpose surveys. The Economic Journal 113 F540–F567.
  • Crawford, F. W., Minin, V. N. and Suchard, M. A. (2014). Estimation for general birth–death processes. J. Amer. Statist. Assoc. 109 730–747.
  • Crawford, F. W. and Suchard, M. A. (2012). Transition probabilities for general birth–death processes with applications in ecology, genetics, and evolution. J. Math. Biol. 65 553–580.
  • Crawford, F. W., Weiss, R. E. and Suchard, M. A. (2015). Supplement to “Sex, lies and self-reported counts: Bayesian mixture models for heaping in longitudinal count data via birth–death processes.” DOI:10.1214/15-AOAS809SUPP.
  • Crockett, A. and Crockett, R. (2006). Consequences of data heaping in the British religious census of 1851. Historical Methods: A Journal of Quantitative and Interdisciplinary History 39 24–46.
  • Feller, W. (1971). An Introduction to Probability Theory and Its Applications. Wiley, New York.
  • Fenton, K. A., Johnson, A. M., McManus, S. and Erens, B. (2001). Measuring sexual behaviour: Methodological challenges in survey research. Sexually Transmitted Infections 77 84–92.
  • Ghosh, P. and Tu, W. (2009). Assessing sexual attitudes and behaviors of young women: A joint model with nonlinear time effects, time varying covariates, and dropouts. J. Amer. Statist. Assoc. 104 474–485.
  • Golubjatnikov, R., Pfister, J. and Tillotson, T. (1983). Homosexual promiscuity and the fear of AIDS. The Lancet 322 681.
  • Grunwald, G. K., Bruce, S. L., Jiang, L., Strand, M. and Rabinovitch, N. (2011). A statistical model for under- or overdispersed clustered and longitudinal count data. Biom. J. 53 578–594.
  • Heitjan, D. F. (1989). Inference from grouped continuous data: A review. Statist. Sci. 4 164–179.
  • Heitjan, D. F. and Rubin, D. B. (1990). Inference from coarse data via multiple imputation with application to age heaping. J. Amer. Statist. Assoc. 85 304–314.
  • Heitjan, D. F. and Rubin, D. B. (1991). Ignorability and coarse data. Ann. Statist. 19 2244–2253.
  • Hobson, R. (1976). Properties preserved by some smoothing functions. J. Amer. Statist. Assoc. 71 763–766.
  • Huttenlocher, J., Hedges, L. V. and Bradburn, N. M. (1990). Reports of elapsed time: Bounding and rounding processes in estimation. Journal of Experimental Psychology: Learning, Memory, and Cognition 16 196–213.
  • Jacobsen, M. and Keiding, N. (1995). Coarsening at random in general sample spaces and random censoring in continuous time. Ann. Statist. 23 774–786.
  • Karlin, S. and McGregor, J. L. (1957). The differential equations of birth-and-death processes, and the Stieltjes moment problem. Trans. Amer. Math. Soc. 85 489–546.
  • Klar, B., Parthasarathy, P. R. and Henze, N. (2010). Zipf and Lerch limit of birth and death processes. Probab. Engrg. Inform. Sci. 24 129–144.
  • Klovdahl, A. S., Potterat, J. J., Woodhouse, D. E., Muth, J. B., Muth, S. Q. and Darrow, W. W. (1994). Social networks and infectious disease: The Colorado Springs study. Social Science & Medicine 38 79–88.
  • Lange, K. (2010). Applied Probability, 2nd ed. Springer, New York.
  • Lee, J., Weiss, R. E. and Suchard, M. A. (2014). Using a birth–death process to account for reporting errors in longitudinal self-reported counts of behavior. Available at arXiv:1410.6870.
  • Lindley, D. V. (1950). Grouping corrections and maximum likelihood equations. Math. Proc. Cambridge Philos. Soc. 46 106–110.
  • McLain, A. C., Sundaram, R., Thoma, M., Louis, B. and Germaine, M. (2014). Semiparametric modeling of grouped current duration data with preferential reporting. Stat. Med. 33 3961–3972.
  • Murphy, J. A. and O’Donohoe, M. R. (1975). Some properties of continued fractions with applications in Markov processes. J. Inst. Math. Appl. 16 57–71.
  • Myers, R. J. (1954). Accuracy of age reporting in the 1950 United States census. J. Amer. Statist. Assoc. 49 826–831.
  • Myers, R. J. (1976). An instance of reverse heaping of ages. Demography 13 577–580.
  • Novozhilov, A. S., Karev, G. P. and Koonin, E. V. (2006). Biological applications of the theory of birth-and-death processes. Brief. Bioinformatics 7 70–85.
  • Renshaw, E. (2011). Stochastic Population Processes: Analysis, Approximations, Simulations. Oxford Univ. Press, Oxford.
  • Roberts, J. M. Jr. and Brewer, D. D. (2001). Measures and tests of heaping in discrete quantitative distributions. J. Appl. Stat. 28 887–896.
  • Rotheram-Borus, M. J., Lee, M. B., Murphy, D. A., Futterman, D., Duan, N., Birnbaum, J. M. and Lightfoot, M. (2001). Efficacy of a preventive intervention for youths living with HIV. American Journal of Public Health 91 400–405.
  • Rowland, M. L. (1990). Self-reported weight and height. Am. J. Clin. Nutr. 52 1125–1133.
  • Schaeffer, N. C. (1999). Asking questions about threatening topics: A selective overview. In The Science of Self-Report: Implications for Research and Practice (A. A. Stone, C. A. Bachrach, J. B. Jobe, H. S. Kurtzman and V. S. Cain, eds.). Lawrence Erlbaum Associates, Mahwah, NJ.
  • Schneeweiss, H. and Augustin, T. (2006). Some recent advances in measurement error models and methods. Allg. Stat. Arch. 90 183–197.
  • Schneeweiss, H. and Komlos, J. (2009). Probabilistic rounding and Sheppard’s correction. Stat. Methodol. 6 577–593.
  • Schneeweiss, H., Komlos, J. and Ahmad, A. S. (2010). Symmetric and asymmetric rounding: A review and some new results. AStA Adv. Stat. Anal. 94 247–271.
  • Sheppard, W. F. (1897). On the calculation of the most probable values of frequency-constants, for data arranged according to equidistant division of a scale. Proc. Lond. Math. Soc. (3) 1 353–380.
  • Singh, K. K., Suchindran, C. M. and Singh, R. S. (1994). Smoothed breastfeeding durations and waiting time to conception. Biodemography and Social Biology 41 229–239.
  • Stockwell, E. G. and Wicks, J. W. (1974). Age heaping in recent national censuses. Biodemography and Social Biology 21 163–167.
  • Tallis, G. M. (1967). Approximate maximum likelihood estimates from grouped data. Technometrics 9 599–606.
  • Wang, H. and Heitjan, D. F. (2008). Modeling heaping in self-reported cigarette counts. Stat. Med. 27 3789–3804.
  • Wang, H., Shiffman, S., Griffith, S. D. and Heitjan, D. F. (2012). Truth and memory: Linking instantaneous and retrospective self-reported cigarette consumption. Ann. Appl. Stat. 6 1689–1706.
  • Weinhardt, L. S., Forsyth, A. D., Carey, M. P., Jaworski, B. C. and Durant, L. E. (1998). Reliability and validity of self-report measures of HIV-related sexual behavior: Progress since 1990 and recommendations for research and practice. Archives of Sexual Behavior 27 155–180.
  • Westoff, C. F. (1974). Coital frequency and contraception. Family Planning Perspectives 6 136–141.
  • Wiederman, M. W. (1997). The truth must be in here somewhere: Examining the gender discrepancy in self-reported lifetime number of sex partners. Journal of Sex Research 34 375–386.
  • Wright, D. E. and Bray, I. (2003). A mixture model for rounded data. The Statistician 52 3–13.

Supplemental materials

  • Supplemental article. We provide a derivation of the Laplace transform of transition probabilities for a general BDP, the full posterior distribution and an outline of Monte Carlo sampling procedures for unknown parameters.