Abstract
Reconstructing three-dimensional (3D) chromatin structure from conformation capture assays (such as Hi-C) is a critical task in computational biology, since chromatin spatial architecture plays a vital role in numerous cellular processes and direct imaging is challenging. Most existing algorithms that operate on Hi-C contact matrices produce reconstructed 3D configurations in the form of a polygonal chain. However, none of the methods exploit the fact that the target solution is a (smooth) curve in 3D: this contiguity attribute is either ignored or indirectly addressed by imposing spatial constraints that are challenging to formulate. In this paper we develop both B-spline and smoothing spline techniques for directly capturing this potentially complex 1D curve. We subsequently combine these techniques with a Poisson model for contact counts and compare their performance on a real data example. In addition, motivated by the sparsity of Hi-C contact data, especially when obtained from single-cell assays, we appreciably extend the class of distributions used to model contact counts. We build a general distribution-based metric scaling (DBMS) framework from which we develop zero-inflated and Hurdle Poisson models as well as negative binomial applications. Illustrative applications make recourse to bulk Hi-C data from IMR90 cells and single-cell Hi-C data from mouse embryonic stem cells.
Funding Statement
E.T. was partially supported by the Stanford Data Science Scholarship, by grant RGPIN-2023-04727 from Natural Sciences and Engineering Research Council of Canada, and by grant MC-2023-05 from the University of Toronto McLaughlin Center.
T.H. was partially supported by grants DMS-1407548 and IIS 1837931 from the National Science Foundation and grant 5R01 EB 001988-21 from the National Institutes of Health.
M.S. was partially supported by grant GM-109457 from the National Institutes of Health.
Acknowledgments
The authors express gratitude to the Associate Editor and reviewers for the helpful feedback, which included a critical assessment of our initial analysis and led to substantial improvements in the manuscript.
Citation
Elena Tuzhilina. Trevor Hastie. Mark Segal. "Statistical curve models for inferring 3D chromatin architecture." Ann. Appl. Stat. 18 (4) 2979 - 3006, December 2024. https://doi.org/10.1214/24-AOAS1917
Information