June 2021 Large-scale multiple inference of collective dependence with applications to protein function
Robert Jernigan, Kejue Jia, Zhao Ren, Wen Zhou
Author Affiliations +
Ann. Appl. Stat. 15(2): 902-924 (June 2021). DOI: 10.1214/20-AOAS1431

Abstract

Measuring the dependence of k3 random variables and drawing inference from such higher-order dependences are scientifically important yet challenging. Motivated here by protein coevolution with multivariate categorical features, we consider an information theoretic measure of higher-order dependence. The proposed collective dependence is a symmetrization of differential interaction information which generalizes the mutual information of a pair of random variables. We show that the collective dependence can be easily estimated and facilitates a test on the dependence of k3 random variables. Upon carefully exploring the null space of collective dependence, we devise a Classification-Assisted Large scaLe inference procedure to DEtect significant k-COllective DEpendence among dk random variables, with the false discovery rate controlled. Finite sample performance of our method is examined via simulations. We apply this method to the multiple protein sequence alignment data to study the residue or position coevolution for two protein families, the elongation factor P family and the zinc knuckle family. We identify novel functional triplets of amino acid residues, whose contributions to the protein function are further investigated. These confirm that the collective dependence does yield additional information important for understanding the protein coevolution compared to the pairwise measures.

Funding Statement

The first and second authors were supported, in part, by NIH Grant R01-GM127701. The third author was supported, in part, by NSF Grant DMS-1812030, an AMS Simons Travel Grant and the Central Research Development Fund at the University of Pittsburgh. The fourth author was supported, in part, by DOE Grant DE-SC0018344 and NSF Grants IIS-1545994 and IOS-1922701.

Acknowledgments

The authors would like to thank the anonymous referees, an Associate Editor and the Editor for their constructive comments that improved the quality of this paper.

Citation

Download Citation

Robert Jernigan. Kejue Jia. Zhao Ren. Wen Zhou. "Large-scale multiple inference of collective dependence with applications to protein function." Ann. Appl. Stat. 15 (2) 902 - 924, June 2021. https://doi.org/10.1214/20-AOAS1431

Information

Received: 1 December 2019; Revised: 1 November 2020; Published: June 2021
First available in Project Euclid: 12 July 2021

MathSciNet: MR4298967
zbMATH: 1477.62314
Digital Object Identifier: 10.1214/20-AOAS1431

Keywords: Collective dependence , False discovery rate , information theoretic measure , multiple testing , protein coevolution , structural biology

Rights: Copyright © 2021 Institute of Mathematical Statistics

JOURNAL ARTICLE
23 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.15 • No. 2 • June 2021
Back to Top