Bayesian Analysis

A Bayesian Nonparametric Approach to Testing for Dependence Between Random Variables

Sarah Filippi and Chris C. Holmes

Full-text: Open access

Abstract

Nonparametric and nonlinear measures of statistical dependence between pairs of random variables are important tools in modern data analysis. In particular the emergence of large data sets can now support the relaxation of linearity assumptions implicit in traditional association scores such as correlation. Here we describe a Bayesian nonparametric procedure that leads to a tractable, explicit and analytic quantification of the relative evidence for dependence vs independence. Our approach uses Pólya tree priors on the space of probability measures which can then be embedded within a decision theoretic test for dependence. Pólya tree priors can accommodate known uncertainty in the form of the underlying sampling distribution and provides an explicit posterior probability measure of both dependence and independence. Well known advantages of having an explicit probability measure include: easy comparison of evidence across different studies; encoding prior information; quantifying changes in dependence across different experimental conditions, and the integration of results within formal decision analysis.

Article information

Source
Bayesian Anal. Volume 12, Number 4 (2017), 919-938.

Dates
First available in Project Euclid: 21 September 2016

Permanent link to this document
https://projecteuclid.org/euclid.ba/1474463236

Digital Object Identifier
doi:10.1214/16-BA1027

Keywords
dependence measure Bayesian nonparametrics Pólya tree hypothesis testing

Rights
Creative Commons Attribution 4.0 International License.

Citation

Filippi, Sarah; Holmes, Chris C. A Bayesian Nonparametric Approach to Testing for Dependence Between Random Variables. Bayesian Anal. 12 (2017), no. 4, 919--938. doi:10.1214/16-BA1027. https://projecteuclid.org/euclid.ba/1474463236


Export citation

References

  • Cover, T. M. and Thomas, J. (1991). Elements of Information Theory.Wiley, New York.
  • Curtis, C., Shah, S. P., Chin, S.-F., Turashvili, G., Rueda, O. M., Dunning, M. J., Speed, D., Lynch, A. G., Samarajiwa, S., Yuan, Y., et al. (2012). “The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups.”Nature, 486(7403): 346–352.
  • Ferguson, T. S. (1974). “Prior distributions on spaces of probability measures.”The Annals of Statistics, 615–629.
  • Filippi, S. and Holmes, C. C. (2016). “Supplementary Material of “A Bayesian Nonparametric Approach to Testing for Dependence Between Random Variables”.”Bayesian Analysis.
  • Ghosh, J. K. and Ramamoorthi, R. (2003).Bayesian Nonparametrics, volume 1. Springer.
  • Gretton, A. and Györfi, L. (2010). “Consistent nonparametric tests of independence.”The Journal of Machine Learning Research, 11: 1391–1423.
  • Hanson, T. E. (2006). “Inference for mixtures of finite Polya tree models.”Journal of the American Statistical Association, 101(476).
  • Hjort, N. L., Holmes, C., Müller, P., and Walker, S. G. (2010).Bayesian Nonparametrics, volume 28. Cambridge University Press.
  • Holmes, C. C., Caron, F., Griffin, J. E., and Stephens, D. A. (2015). “Two-sample Bayesian nonparametric hypothesis testing.”Bayesian Analysis, 10(2): 297–320.
  • Hsu, C.-L., Juan, H.-F., and Huang, H.-C. (2015). “Functional Analysis and Characterization of Differential Coexpression Networks.”Scientific Reports, 5.
  • Kinney, J. B. and Atwal, G. S. (2014). “Equitability, mutual information, and the maximal information coefficient.”Proceedings of the National Academy of Sciences, 111(9): 3354–3359.
  • Langfelder, P. and Horvath, S. (2007). “Eigengene networks for studying the relationships between co-expression modules.”BMC Systems Biology, 1(1): 54.
  • Langfelder, P. and Horvath, S. (2008). “WGCNA: an R package for weighted correlation network analysis.”BMC Bioinformatics, 9(1): 559.
  • Lavine, M. (1992). “Some aspects of Polya tree distributions for statistical modelling.”The Annals of Statistics, 1222–1235.
  • Lavine, M. (1994). “More aspects of Polya tree distributions for statistical modelling.”The Annals of Statistics, 1161–1176.
  • Leonard, T. (1973). “A Bayesian method for histograms.”Biometrika, 60(2): 297–308.
  • Ma, L. and Wong, W. H. (2011). “Coupling optional Pólya trees and the two sample problem.”Journal of the American Statistical Association, 106(496).
  • MacKay, D. J. (2003).Information Theory, Inference, and Learning Algorithms, volume 7. Citeseer.
  • Mauldin, R. D., Sudderth, W. D., and Williams, S. (1992). “Polya trees and random distributions.”The Annals of Statistics, 1203–1221.
  • Paddock, S. M., Ruggeri, F., Lavine, M., and West, M. (2003). “Randomized Polya tree models for nonparametric Bayesian inference.”Statistica Sinica, 13(2): 443–460.
  • Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., Lander, E. S., Mitzenmacher, M., and Sabeti, P. C. (2011). “Detecting novel associations in large data sets.”Science, 334(6062): 1518–1524.
  • Shannon, C. E. and Weaver, W. (1949).The Mathematical Theory of Communication.The University of Illinois Press.
  • Trippa, L., Müller, P., and Johnson, W. (2011). “The multivariate beta process and an extension of the Polya tree model.”Biometrika, 98(1): 17–34.
  • Walker, S. and Mallick, B. K. (1999). “A Bayesian semiparametric accelerated failure time model.”Biometrics, 55(2): 477–483.
  • Wills, Q. F., Livak, K. J., Tipping, A. J., Enver, T., Goldson, A. J., Sexton, D. W., and Holmes, C. (2013). “Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments.”Nature Biotechnology, 31(8): 748–752.
  • Wong, W. H., Ma, L., et al. (2010). “Optional Pólya tree and Bayesian inference.”The Annals of Statistics, 38(3): 1433–1459.

Supplemental materials