Abstract
High-throughput sequencing technology allows us to test the compositional difference of bacteria in different populations. One important feature of human microbiome data is that it often includes a large number of zeros. Such data can be treated as being generated from a two-part model that includes a zero-point mass. Motivated by analysis of such nonnegative data with excessive zeros, we introduce several truncated rank-based two-group and multigroup tests, including a truncated rank-based Wilcoxon rank-sum test for two-group comparison and two truncated Kruskal–Wallis tests for multigroup comparisons. We show, both analytically through asymptotic relative efficiency analysis and by simulations, that the proposed tests have higher power than the standard rank-based tests in typical microbiome data settings, especially when the proportion of zeros in the data is high. The tests can also be applied to repeated measurements of compositional data via simple within-subject permutations. In a simple before-and-after treatment experiment, the within-subject permutation is similar to the paired rank test. However, the proposed tests handle the excessive zeros which leads to a better power. We apply the tests to compare the microbiome compositions of healthy children and pediatric Crohn’s disease patients and to assess the treatment effects on microbiome compositions. We identify several bacterial genera that are missed by the standard rank-based tests.
Funding Statement
This research was supported by NIH Grants GM129781 and GM123056.
Acknowledgments
We thank Dr. Kafadar, the Aassociate Editor and two reviewers for many helpful comments and suggestions.
Citation
Wanjie Wang. Eric Chen. Hongzhe Li. "Truncated rank-based tests for two-part models with excessive zeros and applications to microbiome data." Ann. Appl. Stat. 17 (2) 1663 - 1680, June 2023. https://doi.org/10.1214/22-AOAS1688
Information