Electronic Journal of Statistics

A Bayesian nonparametric model for white blood cells in patients with lower urinary tract symptoms

William Barcella, Maria De Iorio, Gianluca Baio, and James Malone-Lee

Full-text: Open access


Lower Urinary Tract Symptoms (LUTS) affect a significant proportion of the population and often lead to a reduced quality of life. LUTS overlap across a wide variety of diseases, which makes the diagnostic process extremely complicated. In this work we focus on the relation between LUTS and Urinary Tract Infection (UTI). The latter is detected through the number of White Blood Cells (WBC) in a sample of urine: WBC$\geq1$ indicates UTI and high levels may indicate complications. The objective of this work is to provide the clinicians with a tool for supporting the diagnostic process, deepening the available knowledge about LUTS and UTI. We analyze data recording both LUTS profile and WBC count for each patient. We propose to model the WBC using a random partition model in which we specify a prior distribution over the partition of the patients which includes the clustering information contained in the LUTS profile. Then, within each cluster, the WBC counts are assumed to be generated by a zero-inflated Poisson distribution. The results of the predictive distribution allows to identify the symptoms configuration most associated with the presence of UTI as well as with severe infections.

Article information

Electron. J. Statist., Volume 10, Number 2 (2016), 3287-3309.

Received: December 2015
First available in Project Euclid: 16 November 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian nonparametric zero-inflated Poisson distribution Dirichlet process mixture model random partition model clustering with covariates


Barcella, William; De Iorio, Maria; Baio, Gianluca; Malone-Lee, James. A Bayesian nonparametric model for white blood cells in patients with lower urinary tract symptoms. Electron. J. Statist. 10 (2016), no. 2, 3287--3309. doi:10.1214/16-EJS1177. https://projecteuclid.org/euclid.ejs/1479287222

Export citation


  • Agarwal, D. K., Gelfand, A. E. and Citron-Pousty, S. (2002). Zero-inflated models with application to spatial count data., Environmental and Ecological Statistics 9 341–355.
  • Aldous, D. J. (1985). Exchangeability and related topics. In, École d’Été de Probabilités de Saint-Flour XIII—1983 1–198. Springer, Berlin, Heidelberg.
  • Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems., The Annals of Statistics 1152–1174.
  • Binder, D. A. (1978). Bayesian cluster analysis., Biometrika 65 31–38.
  • Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes., The Annals of Statistics 353–355.
  • Breiman, L. (2001). Random forests., Machine Learning 45 5–32.
  • Breiman, L., Friedman, J., Stone, C. J. and Olshen, R. A. (1984)., Classification and regression trees. CRC press.
  • Brier, G. W. (1950). Verification of forecasts expressed in terms of probability., Monthly Weather Review 78 1–3.
  • Carpenter, B., Gelman, A. and Hoffman, M. (2015). Stan: a probabilistic programming language., Forthcoming.
  • Cruz-Marcelo, A., Rosner, G. L., Müller, P. and Stewart, C. F. (2013). Effect on Prediction When Modeling Covariates in Bayesian Nonparametric Models., Journal of Statistical Theory and Practice 7 204–218.
  • Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures., Journal of the American Statistical Association 90 577–588.
  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems., The Annals of Statistics 209–230.
  • Gill, K., Horsley, H., Kupelian, A. S., Baio, G., De Iorio, M., Sathiananamoorthy, S., Khasriya, R., Rohn, J. L., Wildman, S. S. and Malone-Lee, J. (2015). Urinary ATP as an indicator of infection and inflammation of the urinary tract in patients with lower urinary tract symptoms., BMC Urology 15 7.
  • Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: a case study., Biometrics 56 1030–1039.
  • Hannah, L. A., Blei, D. M. and Powell, W. B. (2011). Dirichlet process mixtures of generalized linear models., Journal of Machine Learning Research 1 1–33.
  • Irwin, D. E., Milsom, I., Hunskaar, S., Reilly, K., Kopp, Z., Herschorn, S., Coyne, K., Kelleher, C., Hampel, C., Artibani, W. and Abrams, P. (2006). Population-based survey of urinary incontinence, overactive bladder, and other lower urinary tract symptoms in five countries: results of the EPIC study., European Urology 50 1306–1315.
  • Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors., Journal of the American Statistical Association 96.
  • Ishwaran, H. and James, L. F. (2002). Approximate Dirichlet process computing in finite normal mixtures., Journal of Computational and Graphical Statistics 11.
  • Jara, A., García-Zattera, M. J. andLesaffre, E. (2007). A Dirichlet process mixture model for the analysis of correlated binary responses., Computational Statistics & Data Analysis 51 5402–5415.
  • Khasriya, R., Khan, S., Lunawat, R., Bishara, S., Bignal, J., Malone-Lee, M., Ishii, H., O’Connor, D., Kelsey, M. and Malone-Lee, J. (2010). The inadequacy of urinary dipstick and microscopy as surrogate markers of urinary tract infection in urological outpatients with lower urinary tract symptoms without acute frequency and dysuria., The Journal of Urology 183 1843–1847.
  • Kupelian, A. S., Horsley, H., Khasriya, R., Amussah, R. T., Badiani, R., Courtney, A. M., Chandhyoke, N. S., Riaz, U., Savlani, K., Moledina, M., Montes, S., O’Connor, D., Visavadia, R., Kelsey, M., Rohn, J. L. and Malone-Lee, J. (2013). Discrediting microscopic pyuria and leucocyte esterase as diagnostic surrogates for infection in patients with lower urinary tract symptoms: results from a clinical and laboratory evaluation., BJU International 112 231–238.
  • Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing., Technometrics 34 1–14.
  • Lau, J. W. and Green, P. J. (2007). Bayesian model-based clustering procedures., Journal of Computational and Graphical Statistics 16 526–558.
  • Leann Long, D., Preisser, J. S., Herring, A. H. and Golin, C. E. (2015). A marginalized zero-inflated Poisson regression model with random effects., Journal of the Royal Statistical Society: Series C (Applied Statistics).
  • Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates: I. Density estimates., The Annals of Statistics 12 351–357.
  • Lunn, D. J., Thomas, A., Best, N. and Spiegelhalter, D. (2000). WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility., Statistics and Computing 10 325–337.
  • Molitor, J., Papathomas, M., Jerrett, M. and Richardson, S. (2010). Bayesian profile regression with an application to the National Survey of Children’s Health., Biostatistics 11 484–498.
  • Mullahy, J. (1986). Specification and testing of some modified count data models., Journal of Econometrics 33 341–365.
  • Müller, P., Erkanli, A. and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures., Biometrika 83 67–79.
  • Müller, P. and Quintana, F. (2010). Random partition models with regression on covariates., Journal of Statistical Planning and Inference 140 2801–2808.
  • Müller, P., Quintana, F. and Rosner, G. L. (2011). A product partition model with regression on covariates., Journal of Computational and Graphical Statistics 20.
  • Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models., Journal of Computational and Graphical Statistics 9 249–265.
  • Neelon, B. H., O’Malley, A. J. and Normand, S.-L. T. (2010). A Bayesian model for repeated measures zero-inflated count data with application to outpatient psychiatric service use., Statistical Modelling 10 421–439.
  • Ohlssen, D. I., Sharples, L. D. and Spiegelhalter, D. J. (2007). Flexible random-effects models using Bayesian semi-parametric models: applications to institutional comparisons., Statistics in Medicine 26 2088–2112.
  • Park, J.-H. and Dunson, D. B. (2010). Bayesian generalized product partition model., Statistica Sinica 20 1203–1226.
  • Plummer, M. et al. (2003). JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In, Proceedings of the 3rd International Workshop on Distributed Statistical Computing 124 125. Technische Universit at Wien.
  • Quintana, F. A. and Iglesias, P. L. (2003). Bayesian clustering and product partition models., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65 557–574.
  • Sethuraman, J. (1994). A constructive definition of Dirichlet priors., Statistica Sinica 639–650.
  • Shahbaba, B. and Neal, R. (2009). Nonlinear models using Dirichlet process mixtures., The Journal of Machine Learning Research 10 1829–1850.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 583–639.
  • Wade, S., Dunson, D. B., Petrone, S. and Trippa, L. (2014). Improving prediction from Dirichlet process mixtures via enrichment., The Journal of Machine Learning Research 15 1041–1071.