A general framework for association analysis of heterogeneous data

Gen Li; Irina Gaynanova

doi:10.1214/17-AOAS1127

September 2018 A general framework for association analysis of heterogeneous data

Gen Li, Irina Gaynanova

Ann. Appl. Stat. 12(3): 1700-1726 (September 2018). DOI: 10.1214/17-AOAS1127

Abstract

Multivariate association analysis is of primary interest in many applications. Despite the prevalence of high-dimensional and non-Gaussian data (such as count-valued or binary), most existing methods only apply to low-dimensional data with continuous measurements. Motivated by the Computer Audition Lab 500-song (CAL500) music annotation study, we develop a new framework for the association analysis of two sets of high-dimensional and heterogeneous (continuous/binary/count) data. We model heterogeneous random variables using exponential family distributions, and exploit a structured decomposition of the underlying natural parameter matrices to identify shared and individual patterns for two data sets. We also introduce a new measure of the strength of association, and a permutation-based procedure to test its significance. An alternating iteratively reweighted least squares algorithm is devised for model fitting, and several variants are developed to expedite computation and achieve variable selection. The application to the CAL500 data sheds light on the relationship between acoustic features and semantic annotations, and provides effective means for automatic music annotation and retrieval.

Citation

Download Citation

Gen Li. Irina Gaynanova. "A general framework for association analysis of heterogeneous data." Ann. Appl. Stat. 12 (3) 1700 - 1726, September 2018. https://doi.org/10.1214/17-AOAS1127

Information

Received: 1 February 2017; Revised: 1 November 2017; Published: September 2018

First available in Project Euclid: 11 September 2018

zbMATH: 06979648

MathSciNet: MR3852694

Digital Object Identifier: 10.1214/17-AOAS1127

Keywords: association coefficient , exponential family , generalized linear model , inter-battery factor analysis , joint and individual structure , matrix decomposition