Abstract
We study the problem of data-driven background estimation, arising in the search of physics signals predicted by the Standard Model at the Large Hadron Collider. Our work is motivated by the search for the production of pairs of Higgs bosons decaying into four bottom quarks. A number of other physical processes, known as background, also share the same final state. The data arising in this problem is, therefore, a mixture of unlabeled background and signal events, and the primary aim of the analysis is to determine whether the proportion of unlabeled signal events is nonzero. A challenging but necessary first step is to estimate the distribution of background events. Past work in this area has determined regions of the space of collider events, where signal is unlikely to appear and where the background distribution is, therefore, identifiable. The background distribution can be estimated in these regions and extrapolated into the region of primary interest using transfer learning with a multivariate classifier. We build upon this existing approach in two ways. First, we revisit this method by developing a customized residual neural network which is tailored to the structure and symmetries of collider data. Second, we develop a new method for background estimation, based on the optimal transport problem, which relies on modeling assumptions distinct from earlier work. These two methods can serve as cross-checks for each other in particle physics analyses, due to the complementarity of their underlying assumptions. We compare their performance on simulated double Higgs boson data.
Funding Statement
This work was supported in part by NSF Grants PHY-2020295, DMS-2053804, and DMS-2310632.
TM was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) through a PGS D Scholarship.
Acknowledgments
We are grateful to the CMU Statistical Methods for the Physical Sciences (STAMPS) research group for insightful discussions and feedback throughout this work.
Citation
Tudor Manole. Patrick Bryant. John Alison. Mikael Kuusela. Larry Wasserman. "Background modeling for double Higgs boson production: Density ratios and optimal transport." Ann. Appl. Stat. 18 (4) 2950 - 2978, December 2024. https://doi.org/10.1214/24-AOAS1916
Information