Statistical Science

On Quantifying Dependence: A Framework for Developing Interpretable Measures

Matthew Reimherr and Dan L. Nicolae

Full-text: Open access

Abstract

We present a framework for selecting and developing measures of dependence when the goal is the quantification of a relationship between two variables, not simply the establishment of its existence. Much of the literature on dependence measures is focused, at least implicitly, on detection or revolves around the inclusion/exclusion of particular axioms and discussing which measures satisfy said axioms. In contrast, we start with only a few nonrestrictive guidelines focused on existence, range and interpretability, which provide a very open and flexible framework. For quantification, the most crucial is the notion of interpretability, whose foundation can be found in the work of Goodman and Kruskal [Measures of Association for Cross Classifications (1979) Springer], and whose importance can be seen in the popularity of tools such as the $R^{2}$ in linear regression. While Goodman and Kruskal focused on probabilistic interpretations for their measures, we demonstrate how more general measures of information can be used to achieve the same goal. To that end, we present a strategy for building dependence measures that is designed to allow practitioners to tailor measures to their needs. We demonstrate how many well-known measures fit in with our framework and conclude the paper by presenting two real data examples. Our first example explores U.S. income and education where we demonstrate how this methodology can help guide the selection and development of a dependence measure. Our second example examines measures of dependence for functional data, and illustrates them using data on geomagnetic storms.

Article information

Source
Statist. Sci., Volume 28, Number 1 (2013), 116-130.

Dates
First available in Project Euclid: 29 January 2013

Permanent link to this document
https://projecteuclid.org/euclid.ss/1359468411

Digital Object Identifier
doi:10.1214/12-STS405

Mathematical Reviews number (MathSciNet)
MR3075341

Zentralblatt MATH identifier
1332.62189

Keywords
Measures of dependence quantification information metrics functional data interpretability uses of dependence

Citation

Reimherr, Matthew; Nicolae, Dan L. On Quantifying Dependence: A Framework for Developing Interpretable Measures. Statist. Sci. 28 (2013), no. 1, 116--130. doi:10.1214/12-STS405. https://projecteuclid.org/euclid.ss/1359468411


Export citation

References

  • Ash, R. B. (1990). Information Theory. Dover, New York.
  • Bell, C. B. (1962). Mutual information and maximal correlation as measures of dependence. Ann. Math. Statist. 33 587–595.
  • Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
  • Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory, 2nd ed. Wiley, Hoboken, NJ.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39 1–38.
  • Doksum, K. and Samarov, A. (1995). Nonparametric estimation of global functionals and a measure of the explanatory power of covariates in regression. Ann. Statist. 23 1443–1473.
  • Ebrahimi, N., Soofi, E. S. and Soyer, R. (2010). Information measures in perspective. International Statistical Review 78 383–412.
  • Efron, B. (1978). Regression and ANOVA with zero-one data: Measures of residual variation. J. Amer. Statist. Assoc. 73 113–121.
  • Gelfand, I. M. and Fomin, S. V. (1963). Calculus of Variations. Prentice Hall International, Englewood Cliffs, NJ.
  • Goodman, L. A. and Kruskal, W. H. (1979). Measures of Association for Cross Classifications. Springer Series in Statistics 1. Springer, New York.
  • Grey, R. M. (2011). Entropy and Information Theory. Springer, New York.
  • Hall, W. J. (1970). On characterizing dependence in joint distributions. In Essays in Probability and Statistics 339–376. Univ. North Carolina Press, Chapel Hill, NC.
  • Horváth, L., Kokoszka, P. and Reimherr, M. (2009). Two sample inference in functional linear models. Canad. J. Statist. 37 571–591.
  • Lehmann, E. L. (1966). Some concepts of dependence. Ann. Math. Statist. 37 1137–1153.
  • Liang, K.-Y., Zeger, S. L. and Qaqish, B. (1992). Multivariate regression analyses for categorical data. J. R. Stat. Soc. Ser. B Stat. Methodol. 54 3–40.
  • Lipsitz, S. R., Laird, N. M. and Harrington, D. P. (1991). Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika 78 153–160.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman & Hall, Boca Raton, FL.
  • Moskowitz, C. (2011). U.S. must take space storm threat seriously, experts warn. Available at http://www.space.com/10906-space-storms-threat.html.
  • Nelsen, R. B. (2010). An Introduction to Copulas. Springer, New York.
  • Nicolae, D. L. (2006). Quantifying the amount of missing information in genetic association studies. Genet. Epidemiol. 30 703–717.
  • Nicolae, D. L., Meng, X.-L. and Kong, A. (2008). Quantifying the fraction of missing information for hypothesis testing in statistical and genetic studies. Statist. Sci. 23 287–312.
  • Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York.
  • Reimherr, M. and Nicolae, D. L. (2011). You’ve gotta be lucky: Coverage and the elusive gene–gene interaction. Ann. Hum. Genet. 75 105–111.
  • Rényi, A. (1959). On measures of dependence. Acta Math. Acad. Sci. Hungar. 10 441–451 (unbound insert).
  • Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., Lander, E. S., Mitzenmacher, M. and Sabeti, P. C. (2011). Detecting novel associations in large data sets. Science 334 1518–1524.
  • Schweizer, B. and Wolff, E. F. (1981). On nonparametric measures of dependence for random variables. Ann. Statist. 9 879–885.
  • Siburg, K. F. and Stoimenov, P. A. (2010). A measure of mutual complete dependence. Metrika 71 239–251.
  • Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 2769–2794.
  • Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Ann. Appl. Stat. 3 1236–1265.
  • U.S. Census Bureau (2010). Educational attainment—people 25 years old and over. Available at http://www.census.gov/hhes/www/cpstables/032010/perinc/new03_001.htm.