December 2024 Statistical curve models for inferring 3D chromatin architecture
Elena Tuzhilina, Trevor Hastie, Mark Segal
Author Affiliations
Ann. Appl. Stat. 18(4): 2979-3006 (December 2024). DOI: 10.1214/24-AOAS1917


Reconstructing three-dimensional (3D) chromatin structure from conformation capture assays (such as Hi-C) is a critical task in computational biology, since chromatin spatial architecture plays a vital role in numerous cellular processes and direct imaging is challenging. Most existing algorithms that operate on Hi-C contact matrices produce reconstructed 3D configurations in the form of a polygonal chain. However, none of the methods exploit the fact that the target solution is a (smooth) curve in 3D: this contiguity attribute is either ignored or indirectly addressed by imposing spatial constraints that are challenging to formulate. In this paper we develop both B-spline and smoothing spline techniques for directly capturing this potentially complex 1D curve. We subsequently combine these techniques with a Poisson model for contact counts and compare their performance on a real data example. In addition, motivated by the sparsity of Hi-C contact data, especially when obtained from single-cell assays, we appreciably extend the class of distributions used to model contact counts. We build a general distribution-based metric scaling (DBMS) framework from which we develop zero-inflated and Hurdle Poisson models as well as negative binomial applications. Illustrative applications make recourse to bulk Hi-C data from IMR90 cells and single-cell Hi-C data from mouse embryonic stem cells.

Funding Statement

E.T. was partially supported by the Stanford Data Science Scholarship, by grant RGPIN-2023-04727 from Natural Sciences and Engineering Research Council of Canada, and by grant MC-2023-05 from the University of Toronto McLaughlin Center.
T.H. was partially supported by grants DMS-1407548 and IIS 1837931 from the National Science Foundation and grant 5R01 EB 001988-21 from the National Institutes of Health.
M.S. was partially supported by grant GM-109457 from the National Institutes of Health.


The authors express gratitude to the Associate Editor and reviewers for the helpful feedback, which included a critical assessment of our initial analysis and led to substantial improvements in the manuscript.


Keywords: conformation reconstruction , metric scaling , spatial structure , splines

Rights: Copyright © 2024 Institute of Mathematical Statistics

