Open Access
March 2018 An empirical study of the maximal and total information coefficients and leading measures of dependence
David N. Reshef, Yakir A. Reshef, Pardis C. Sabeti, Michael Mitzenmacher
Ann. Appl. Stat. 12(1): 123-155 (March 2018). DOI: 10.1214/17-AOAS1093

Abstract

In exploratory data analysis, we are often interested in identifying promising pairwise associations for further analysis while filtering out weaker ones. This can be accomplished by computing a measure of dependence on all variable pairs and examining the highest-scoring pairs, provided the measure of dependence used assigns similar scores to equally noisy relationships of different types. This property, called equitability and previously formalized, can be used to assess measures of dependence along with the power of their corresponding independence tests and their runtime.

Here we present an empirical evaluation of the equitability, power against independence, and runtime of several leading measures of dependence. These include the two recently introduced and simultaneously computable statistics ${\mbox{MIC}_{e}}$, whose goal is equitability, and ${\mbox{TIC}_{e}}$, whose goal is power against independence.

Regarding equitability, our analysis finds that ${\mbox{MIC}_{e}}$ is the most equitable method on functional relationships in most of the settings we considered. Regarding power against independence, we find that ${\mbox{TIC}_{e}}$ and Heller and Gorfine’s ${S^{\mathrm{DDP}}}$ share state-of-the-art performance, with several other methods achieving excellent power as well. Our analyses also show evidence for a trade-off between power against independence and equitability consistent with recent theoretical work. Our results suggest that a fast and useful strategy for achieving a combination of power against independence and equitability is to filter relationships by ${\mbox{TIC}_{e}}$ and then to rank the remaining ones using ${\mbox{MIC}_{e}}$. We confirm our findings on a set of data collected by the World Health Organization.

Citation

Download Citation

David N. Reshef. Yakir A. Reshef. Pardis C. Sabeti. Michael Mitzenmacher. "An empirical study of the maximal and total information coefficients and leading measures of dependence." Ann. Appl. Stat. 12 (1) 123 - 155, March 2018. https://doi.org/10.1214/17-AOAS1093

Information

Received: 1 December 2016; Revised: 1 August 2017; Published: March 2018
First available in Project Euclid: 9 March 2018

zbMATH: 06894701
MathSciNet: MR3773388
Digital Object Identifier: 10.1214/17-AOAS1093

Keywords: equitability , Maximal information coefficient , measures of dependence , statistical power , total information coefficient

Rights: Copyright © 2018 Institute of Mathematical Statistics

Vol.12 • No. 1 • March 2018
Back to Top