Abstract
In immunology studies, flow cytometry is a commonly used multivariate single-cell assay. One key goal in flow cytometry analysis is to detect the immune cells responsive to certain stimuli. Statistically, this problem can be translated into comparing two protein expression probability density functions (pdfs) before and after the stimulus; the goal is to pinpoint the regions where these two pdfs differ. Further screening of these differential regions can be performed to identify enriched sets of responsive cells. In this paper we model identifying differential density regions as a multiple testing problem. First, we partition the sample space into small bins. In each bin we form a hypothesis to test the existence of differential pdfs. Second, we develop a novel multiple testing method, called TEAM (testing on the aggregation tree method), to identify those bins that harbor differential pdfs while controlling the false discovery rate (FDR) under the desired level. TEAM embeds the testing procedure into an aggregation tree to test from fine- to coarse-resolution. The procedure achieves the statistical goal of pinpointing density differences to the smallest possible regions. TEAM is computationally efficient, capable of analyzing large flow cytometry data sets in much shorter time compared with competing methods. We applied TEAM and competing methods on a flow cytometry data set to identify T cells responsive to the cytomegalovirus (CMV)-pp65 antigen stimulation. With additional downstream screening, TEAM successfully identified enriched sets containing monofunctional, bifunctional, and polyfunctional T cells. Competing methods either did not finish in a reasonable time frame or provided less interpretable results. Numerical simulations and theoretical justifications demonstrate that TEAM has asymptotically valid, powerful, and robust performance. Overall, TEAM is a computationally efficient and statistically powerful algorithm that can yield meaningful biological insights in flow cytometry studies.
Funding Statement
Flow cytometric data generation was supported in whole or part through an EQAPOL collaboration with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Contract Number HHSN272201700061C.
John Pura’s research was supported by NSF DGE 1545220 and the NIH training grant T32HL079896.
Cliburn Chan’s research was supported by EQAPOL Contract Number HHSN272201700061C, the Duke University Center for AIDS Research (CFAR), and an NIH-funded program 5P30 AI064518.
Xuechan Li’s and Jichun Xie’s research was supported by Duke University.
Acknowledgments
We thank the Editors, Associate Editors, and the reviewers for providing valuable comments that helped us substantially improve the paper’s quality.
Citation
John A. Pura. Xuechan Li. Cliburn Chan. Jichun Xie. "TEAM: A multiple testing algorithm on the aggregation tree for flow cytometry analysis." Ann. Appl. Stat. 17 (1) 621 - 640, March 2023. https://doi.org/10.1214/22-AOAS1645
Information