Electronic Journal of Statistics

Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures

Sarah Filippi, Chris C. Holmes, and Luis E. Nieto-Barajas

Full-text: Open access

Abstract

In this article we propose novel Bayesian nonparametric methods using Dirichlet Process Mixture (DPM) models for detecting pairwise dependence between random variables while accounting for uncertainty in the form of the underlying distributions. A key criteria is that the procedures should scale to large data sets. In this regard we find that the formal calculation of the Bayes factor for a dependent-vs.-independent DPM joint probability measure is not feasible computationally. To address this we present Bayesian diagnostic measures for characterising evidence against a “null model” of pairwise independence. In simulation studies, as well as for a real data analysis, we show that our approach provides a useful tool for the exploratory nonparametric Bayesian analysis of large multivariate data sets.

Article information

Source
Electron. J. Statist., Volume 10, Number 2 (2016), 3338-3354.

Dates
Received: December 2015
First available in Project Euclid: 16 November 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1479287224

Digital Object Identifier
doi:10.1214/16-EJS1171

Mathematical Reviews number (MathSciNet)
MR3572852

Zentralblatt MATH identifier
1358.62058

Keywords
Bayes nonparametrics contingency table dependence measure hypothesis testing mixture model mutual information

Citation

Filippi, Sarah; Holmes, Chris C.; Nieto-Barajas, Luis E. Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures. Electron. J. Statist. 10 (2016), no. 2, 3338--3354. doi:10.1214/16-EJS1171. https://projecteuclid.org/euclid.ejs/1479287224


Export citation

References

  • D. Blackwell and J. B. MacQueen. Ferguson distributions via Pólya urn schemes., Annals of Statistics, pages 353–355, 1973.
  • T. M. Cover and J. A. Thomas., Elements of information theory. John Wiley & Sons, 2012.
  • D. B. Dahl. Model-based clustering for expression data via a Dirichlet process mixture model., Bayesian inference for gene expression and proteomics, pages 201–218, 2006.
  • J. Diebolt and C. Robert. Estimation of finite mixture distributions by Bayesian sampling., Journal of the Royal Statistical Society, Series B, pages 363–375, 1994.
  • T. S. Ferguson. A Bayesian analysis of some nonparametric problems., Annals of Statistics, 1:209–230, 1973.
  • T. S. Ferguson. Bayesian density estimation by mixtures of normal distributions. In, Recent Advances in Statistics, pages 287–302. Academic Press, New York, 1983.
  • S. Filippi and C. C. Holmes. A Bayesian nonparametric approach to quantifying dependence between random variables., Bayesian Analysis, doi: 10.1214/16-BA1027.
  • I. J. Good and J. F. Crook. The robustness and sensitivity of the mixed Dirichlet Bayesian test for independence in contingency tables., Annals of Statistics, 15:670–693, 1987.
  • E. Gunel and J. Dickey. Bayes factors for independence in contingency tables., Biometrika, 61:545–557, 1974.
  • C. C. Holmes, F. Caron, J. E. Griffin, D. A. Stephens, et al. Two-sample Bayesian nonparametric hypothesis testing., Bayesian Analysis, 10(2):297–320, 2015.
  • A. Jara, T. Hanson, F. Quintana, P. Mueller, and G. Rosner. DPpackage: Bayesian semi- and nonparametric modeling in r., Journal of Statistical Software, pages 1–30, 2011.
  • K. Kamary, K. Mengersen, C. P. Robert, and J. Rousseau. Testing hypotheses via a mixture estimation model., Preprint arXiv:1412.2044, 2014.
  • J. B. Kinney and G. S. Atwal. Equitability, mutual information, and the maximal information coefficient., Proceedings of the National Academy of Sciences, 111(9) :3354–3359, 2014.
  • A. Y. Lo et al. On a class of Bayesian nonparametric estimates: I. density estimates., Annals of Statistics, 12(1):351–357, 1984.
  • E. F. Lock and D. B. Dunson. Two-sample testing with dirichlet mixtures., Preprint arXiv:1311.0307, 2013.
  • K. Pearson. On the $\chi^2$ test of goodness of fit., Biometrika, 14:186–191, 1922.
  • R Core Team., R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2014. URL http://www.R-project.org/.
  • A. F. Smith and G. O. Roberts. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods., Journal of the Royal Statistical Society. Series B (Methodological), pages 3–23, 1993.

Supplemental materials