Abstract
Customer segmentation has wide applications in business activities, such as personalized marketing and targeted product development. To realize customer segmentation, clustering methods are commonly used. However, modern customer segmentation encounters challenges characterized by high-dimensionality and mixed-type variables (i.e., the mixture of continuous variables and categorical variables). It brings great challenges to customer segmentation, because most existing clustering methods are only designed for data with one single type of variables. Furthermore, the existence of noise variables highlights the necessity of simultaneous variable selection and data clustering. Motivated by these issues, we develop a Davies–Bouldin index based sparse clustering (DBI-SC) method for customer segmentation with high-dimensional mixed-type data. In this method we define dissimilarity measures for continuous variables and categorical variables separately. Then an adjusted DBI criterion is designed to measure the contribution of each variable to clustering. For variable selection we apply the sparse clustering framework and introduce different penalty parameters for the mixed-type variables. The screening consistency property of the DBI-SC method is also investigated. Extensive simulation studies demonstrate the satisfactory performance of the DBI-SC method in both clustering and variable selection. Finally, a designated driving service dataset is analyzed for customer segmentation using the proposed method.
Funding Statement
Dr. Yang Li (corresponding author) is supported by the National Natural Science Foundation of China (72271237) and Platform of Public Health & Disease Control and Prevention, Major Innovation & Planning Interdisciplinary Platform for the “Double-First Class” Initiative, Renmin University of China. Feifei Wang’s research is supported by National Natural Science Foundation of China (No. 72371241, 72171229), the MOE Project of Key Research Institute of Humanities and Social Sciences (22JJD910001), and Chinese National Statistical Science Research Project (2022LD06).
Acknowledgments
The authors would like to thank Mr. Qi Lu for his productive discussion.
Citation
Feifei Wang. Shaodong Xu. Yichen Qin. Ye Shen. Yang Li. "Sparse clustering for customer segmentation with high-dimensional mixed-type data." Ann. Appl. Stat. 18 (3) 2382 - 2402, September 2024. https://doi.org/10.1214/24-AOAS1886
Information