The Annals of Applied Statistics

Truth and memory: Linking instantaneous and retrospective self-reported cigarette consumption

Hao Wang, Saul Shiffman, Sandra D. Griffith, and Daniel F. Heitjan

Full-text: Open access


Studies of smoking behavior commonly use the time-line follow-back (TLFB) method, or periodic retrospective recall, to gather data on daily cigarette consumption. TLFB is considered adequate for identifying periods of abstinence and lapse but not for measurement of daily cigarette consumption, thanks to substantial recall and digit preference biases. With the development of the hand-held electronic diary (ED), it has become possible to collect cigarette consumption data using ecological momentary assessment (EMA), or the instantaneous recording of each cigarette as it is smoked. EMA data, because they do not rely on retrospective recall, are thought to more accurately measure cigarette consumption. In this article we present an analysis of consumption data collected simultaneously by both methods from 236 active smokers in the pre-quit phase of a smoking cessation study. We define a statistical model that describes the genesis of the TLFB records as a two-stage process of mis-remembering and rounding, including fixed and random effects at each stage. We use Bayesian methods to estimate the model, and we evaluate its adequacy by studying histograms of imputed values of the latent remembered cigarette count. Our analysis suggests that both mis-remembering and heaping contribute substantially to the distortion of self-reported cigarette counts. Higher nicotine dependence, white ethnicity and male sex are associated with greater remembered smoking given the EMA count. The model is potentially useful in other applications where it is desirable to understand the process by which subjects remember and report true observations.

Article information

Ann. Appl. Stat., Volume 6, Number 4 (2012), 1689-1706.

First available in Project Euclid: 27 December 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian analysis heaping latent variables longitudinal data smoking cessation


Wang, Hao; Shiffman, Saul; Griffith, Sandra D.; Heitjan, Daniel F. Truth and memory: Linking instantaneous and retrospective self-reported cigarette consumption. Ann. Appl. Stat. 6 (2012), no. 4, 1689--1706. doi:10.1214/12-AOAS557.

Export citation


  • Boyd, N. R., Windsor, R. A., Perkins, L. L. and Lowe, J. B. (1998). Quality of measurement of smoking status by self-report and saliva cotinine among pregnant women. Matern Child Health J. 2 77–83.
  • Brown, R. A., Burgess, E. S., Sales, S. D., Whiteley, J. A., Evans, D. M. and Miller, I. W. (1998). Reliability and validity of a smoking timeline follow-back interview. Psychology of Addictive Behaviors 12 101–112.
  • Carlin, B. P. and Louis, T. A. (2000). Bayes and Empirical Bayes Methods For Data Analysis, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Cheong, Y., Yong, H. H. and Borland, R. (2007). Does how you quit affect success? A comparison between abrupt and gradual methods using data from the International Tobacco Control Policy Evaluation Study. Nicotine and Tobacco Research 9 801–810.
  • Dellaportas, P., Stephens, D. A., Smith, A. F. M. and Guttman, I. (1996). A comparative study of perinatal mortality using a two component mixture model. In Bayesian Biostatistics (D. A. Berry and D. K. Stangl, eds.) 601–616. Dekker, New York.
  • Dennis, J. E. Jr. and Schnabel, R. B. (1983). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice Hall International, Englewood Cliffs, NJ.
  • Farrell, L., Fry, T. R. L. and Harris, M. N. (2003). “A pack a day for twenty years”: Smoking and cigarette pack sizes. Research Paper Number 887, Dept. Economics, Univ. Melbourne.
  • Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statist. Sinica 6 733–807.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Gelman, A., Van Mechelen, I., Verbeke, G., Heitjan, D. F. and Meulders, M. (2005). Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics 61 74–85.
  • Hatsukami, D. K., Slade, J., Benowitz, N. L., Giovino, G. A., Gritz, E. R., Leischow, S. and Warner, K. E. (2002). Reducing tobacco harm: Research challenges and issues. Nicotine and Tobacco Research 4 (Suppl2), S89–S101.
  • Heitjan, D. F. and Rubin, D. B. (1990). Inference from coarse data via multiple imputation with application to age heaping. J. Amer. Statist. Assoc. 85 304–314.
  • Heitjan, D. F. and Rubin, D. B. (1991). Ignorability and coarse data. Ann. Statist. 19 2244–2253.
  • Klerman, J. A. (1993). Heaping in restrospecticve data: Insights from Malaysian family life surveys’ breastfeeding data. The RAND Corporation.
  • Klesges, R. C., Debon, M. and Ray, J. W. (1995). Are self-reports of smoking rate biased? Evidence from the second national health and nutrition examination survey. J. Clin. Epidemiol. 48 1225–1233.
  • Pickering, R. M. (1992). Digit preference in estimated gestational age. Stat. Med. 11 1225–1238.
  • Ridout, M. S. and Morgan, B. J. T. (1991). Modeling digit preference in fecundability studies. Biometrics 47 1423–1433.
  • Roberts, J. M. Jr. and Brewer, D. D. (2001). Measures and tests of heaping in discrete quantitative distributions. J. Appl. Stat. 28 887–896.
  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
  • Shiffman, S. (2009). How many cigarettes did you smoke? Assessing cigarette consumption by global report, time-line follow-cack, and ecological momentary assessment. Health Psychology 28 519–526.
  • Shiffman, S., Ferguson, S. G. and Strahs, K. R. (2009). Quitting smoking by gradual reduction using nicotine gum—a controlled trial. Am. J. Prev. Med. 36 96–104.
  • Shiffman, S., Gitchell, J. G., Warner, K. E., Slade, J., Henningfield, J. E. and Pinney, J. M. (2002). Tobacco harm reduction: Conceptual structure and nomenclature for analysis and research. Nicotine and Tobacco Research 4 113–129.
  • Stone, A. A. and Shiffman, S. (1994). Ecological momentary assessment in behavioral medicine. Annals of Behavioral Medicine 16 199–202.
  • Tanner, M. A. (1993). Tools for Statistical Inference: Methods for the Exploration of Posterior Distributions and Likelihood Functions, 2nd ed. Springer, New York.
  • Torelli, N. and Trivellato, U. (1993). Modelling inaccuracies in job-search duration data. J. Econometrics 59 187–211.
  • Wang, H. and Heitjan, D. F. (2008). Modeling heaping in self-reported cigarette counts. Stat. Med. 27 3789–3804.
  • Wang, H., Shiffman, S., Griffith, S. D. and Heitjan, D. F. (2012). Supplement to “Truth and memeory: Linking instantaneous and restrospective self-reported cigarette consumptions.” DOI:10.1214/12-AOAS557SUPP.
  • Wolff, J. and Augustin, T. (2003). Heaping and its consequences for duration analysis: A simulation study. Allg. Stat. Arch. 87 59–86.
  • Wright, D. E. and Bray, I. (2003). A mixture model for rounded data. The Statistician 52 3–13.

Supplemental materials