Abstract
Mutations in the noncoding DNA, which represents approximately 99% of the human genome, have been crucial to understanding disease mechanisms through dysregulation of disease-associated genes. One key element in gene regulation that noncoding mutations mediate is the binding of proteins to DNA sequences. Insertion and deletion of bases (InDels) are the second most common type of mutations, following single nucleotide polymorphisms, that may impact protein-DNA binding. However, no existing methods can estimate and test the effects of InDels on the process of protein-DNA binding. We develop a novel test of statistical significance, namely, the binding change test (BC test), using a Markov model to evaluate the impact and identify InDels altering protein-DNA binding. The test predicts binding changer InDels of regulatory significance with an efficient importance sampling algorithm generating background sequences in favor of large binding affinity changes. Simulation studies demonstrate its excellent performance. The application to human leukemia data uncovers, in critical cis-regulatory elements, candidate pathological InDels on modulating TF binding in leukemic patients. We develop an R package atIndel, which is available on GitHub.
Funding Statement
Shin was supported in part by U.S. NSF Grant DMS-2113674, Korean NRF grant funded by the Korea government (MSIT) (RS-2023-00243012, RS-2023-00219980), POSTECH Basic Science Research Institute Fund (NRF-2021R1A6A1A10042944), and POSCO HOLDINGS grant 2023Q033.
Xu is a Scholar of The Leukemia & Lymphoma Society (LLS) and an American Society of Hematology (ASH) Scholar.
Acknowledgments
The authors are grateful to Dr. Michael Q. Zhang and Dr. Zhenyu Xuan at University of Texas at Dallas for helpful discussions.
Citation
Qinyi Zhou. Chandler Zuo. Yuannyu Zhang. Min Chen. Jian Xu. Sunyoung Shin. "Scalable test of statistical significance for protein-DNA binding changes with insertion and deletion of bases in the genome." Ann. Appl. Stat. 18 (4) 3528 - 3548, December 2024. https://doi.org/10.1214/24-AOAS1950
Information