The Annals of Applied Statistics

Sex, lies and self-reported counts: Bayesian mixture models for heaping in longitudinal count data via birth–death processes

Forrest W. Crawford, Robert E. Weiss, and Marc A. Suchard

Full-text: Open access


Surveys often ask respondents to report nonnegative counts, but respondents may misremember or round to a nearby multiple of 5 or 10. This phenomenon is called heaping, and the error inherent in heaped self-reported numbers can bias estimation. Heaped data may be collected cross-sectionally or longitudinally and there may be covariates that complicate the inferential task. Heaping is a well-known issue in many survey settings, and inference for heaped data is an important statistical problem. We propose a novel reporting distribution whose underlying parameters are readily interpretable as rates of misremembering and rounding. The process accommodates a variety of heaping grids and allows for quasi-heaping to values nearly but not equal to heaping multiples. We present a Bayesian hierarchical model for longitudinal samples with covariates to infer both the unobserved true distribution of counts and the parameters that control the heaping process. Finally, we apply our methods to longitudinal self-reported counts of sex partners in a study of high-risk behavior in HIV-positive youth.

Article information

Ann. Appl. Stat., Volume 9, Number 2 (2015), 572-596.

Received: May 2014
Revised: February 2015
First available in Project Euclid: 20 July 2015

Permanent link to this document

Digital Object Identifier

Bayesian hierarchical model coarse data continuous-time Markov chain heaping mixture model rounding


Supplemental materials

  • Supplemental article. We provide a derivation of the Laplace transform of transition probabilities for a general BDP, the full posterior distribution and an outline of Monte Carlo sampling procedures for unknown parameters.