December 2023 Binned multinomial logistic regression for integrative cell-type annotation
Keshav Motwani, Rhonda Bacher, Aaron J. Molstad
Author Affiliations +
Ann. Appl. Stat. 17(4): 3426-3449 (December 2023). DOI: 10.1214/23-AOAS1769

Abstract

Categorizing individual cells into one of many known cell-type categories, also known as cell-type annotation, is a critical step in the analysis of single-cell genomics data. The current process of annotation is time intensive and subjective, which has led to different studies describing cell types with labels of varying degrees of resolution. While supervised learning approaches have provided automated solutions to annotation, there remains a significant challenge in fitting a unified model for multiple datasets with inconsistent labels. In this article we propose a new multinomial logistic regression estimator which can be used to model cell-type probabilities by integrating multiple datasets with labels of varying resolution. To compute our estimator, we solve a nonconvex optimization problem using a blockwise proximal gradient descent algorithm. We show through simulation studies that our approach estimates cell-type probabilities more accurately than competitors in a wide variety of scenarios. We apply our method to 10 single-cell RNA-seq datasets and demonstrate its utility in predicting fine resolution cell-type labels on unlabeled data as well as refining cell-type labels on data with existing coarse resolution annotations. Finally, we demonstrate that our method can lead to novel scientific insights in the context of a differential expression analysis comparing peripheral blood gene expression before and after treatment with interferon-β. An R package implementing the method is available in the Supplementary Material and at https://github.com/keshav-motwani/IBMR, and the collection of datasets we analyze is available at https://github.com/keshav-motwani/AnnotatedPBMC.

Funding Statement

Keshav Motwani’s research was supported by the Goldwater Foundation as well as the University Scholars Program at the University of Florida. Aaron J. Molstad’s research was supported by National Science Foundation grant DMS-2113589.

Acknowledgments

The authors thank the Editor, Associate Editor, and three referees for their helpful comments.

Citation

Download Citation

Keshav Motwani. Rhonda Bacher. Aaron J. Molstad. "Binned multinomial logistic regression for integrative cell-type annotation." Ann. Appl. Stat. 17 (4) 3426 - 3449, December 2023. https://doi.org/10.1214/23-AOAS1769

Information

Received: 1 January 2022; Revised: 1 March 2023; Published: December 2023
First available in Project Euclid: 30 October 2023

MathSciNet: MR4661705
Digital Object Identifier: 10.1214/23-AOAS1769

Keywords: cell-type annotation , group lasso , integrative analysis , Multinomial Logistic Regression , nonconvex optimization , Single-cell genomics

Rights: Copyright © 2023 Institute of Mathematical Statistics

Vol.17 • No. 4 • December 2023
Back to Top