Open Access
2024 Simultaneous factors selection and fusion of their levels in penalized logistic regression
Lea Kaufmann, Maria Kateri
Author Affiliations +
Electron. J. Statist. 18(2): 4235-4291 (2024). DOI: 10.1214/24-EJS2296

Abstract

Nowadays, several data analysis problems are high-dimensional, requiring a complexity reduction for their modeling. Under the sparsity assumption, variable selection is feasible, removing the non-influential explanatory variables. When factors are present, with their levels being dummy coded, the number of parameters included in the model grows rapidly, leading to high-dimensional problems even in cases with moderate number of factors. This fact emphasizes the need for a drastical parameter reduction, not only through variable selection but also through fusion of levels of factors. The levels fused are those not differentiating significantly in terms of their influence on the response variable. Such fusions, beyond reducing the dimension of the model, propose scale adjustments for categorical predictors. In this work a new regularization technique is introduced, called L0-fused group lasso (L0-FGL) for binary logistic regression. It uses a group lasso penalty for factor selection and for the fusion part it applies a L0 penalty on the differences among the levels’ parameters of a categorical predictor. Using adaptive weights, the adaptive version of the L0-FGL method is derived. Theoretical properties, such as existence, n consistency and oracle properties under certain conditions, are established. In addition, it is shown that even in the diverging case where the number of parameters pn grows with the sample size n, n consistency and a consistency in variable selection result are achieved, as well as a respective result on asymptotic normality for an approximate L0-FGL solution. Two computational methods, the penalized iteratively reweighted least squares (PIRLS) and a block coordinate descent (BCD) approach using quasi Newton, are developed and implemented. A simulation study supports the outstanding performance of L0-FGL, especially in cases with a large number of factors. Finally, we apply our method on a real dataset corresponding to breast cancer recurrence events.

Acknowledgments

We would like to express our sincere thanks to the associate editor and the reviewers for their useful and constructive comments which helped us improve the paper.

Citation

Download Citation

Lea Kaufmann. Maria Kateri. "Simultaneous factors selection and fusion of their levels in penalized logistic regression." Electron. J. Statist. 18 (2) 4235 - 4291, 2024. https://doi.org/10.1214/24-EJS2296

Information

Received: 1 October 2023; Published: 2024
First available in Project Euclid: 12 November 2024

Digital Object Identifier: 10.1214/24-EJS2296

Keywords: block coordinate descent (BCD) method , group lasso , High-dimensional statistics , L0 norm , L1 norm , Lasso , n consistency , PIRLS algorithm

Vol.18 • No. 2 • 2024
Back to Top