The Annals of Applied Statistics

A testing based approach to the discovery of differentially correlated variable sets

Kelly Bodwin, Kai Zhang, and Andrew Nobel

Given data obtained under two sampling conditions, it is often of interest to identify variables that behave differently in one condition than in the other. We introduce a method for differential analysis of second-order behavior called Differential Correlation Mining (DCM). The DCM method identifies differentially correlated sets of variables, with the property that the average pairwise correlation between variables in a set is higher under one sample condition than the other. DCM is based on an iterative search procedure that adaptively updates the size and elements of a candidate variable set. Updates are performed via hypothesis testing of individual variables, based on the asymptotic distribution of their average differential correlation. We investigate the performance of DCM by applying it to simulated data as well as to recent experimental datasets in genomics and brain imaging.

Article information

Ann. Appl. Stat., Volume 12, Number 2 (2018), 1180-1203.

Received: February 2016
Revised: March 2017
First available in Project Euclid: 28 July 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Differential correlation mining association mining biostatistics genomics high-dimensional data


Bodwin, Kelly; Zhang, Kai; Nobel, Andrew. A testing based approach to the discovery of differentially correlated variable sets. Ann. Appl. Stat. 12 (2018), no. 2, 1180--1203. doi:10.1214/17-AOAS1083.

Supplemental materials

  • Differential correlation mining: Supplementary material. We provide the proof of Corollary 1.1, the derivation of the variance estimator, additional simulation results, extended real data results, and pseudocode for the algorithmic procedures.