September 2024 Sparse clustering for customer segmentation with high-dimensional mixed-type data
Feifei Wang, Shaodong Xu, Yichen Qin, Ye Shen, Yang Li
Author Affiliations +
Ann. Appl. Stat. 18(3): 2382-2402 (September 2024). DOI: 10.1214/24-AOAS1886

Abstract

Customer segmentation has wide applications in business activities, such as personalized marketing and targeted product development. To realize customer segmentation, clustering methods are commonly used. However, modern customer segmentation encounters challenges characterized by high-dimensionality and mixed-type variables (i.e., the mixture of continuous variables and categorical variables). It brings great challenges to customer segmentation, because most existing clustering methods are only designed for data with one single type of variables. Furthermore, the existence of noise variables highlights the necessity of simultaneous variable selection and data clustering. Motivated by these issues, we develop a Davies–Bouldin index based sparse clustering (DBI-SC) method for customer segmentation with high-dimensional mixed-type data. In this method we define dissimilarity measures for continuous variables and categorical variables separately. Then an adjusted DBI criterion is designed to measure the contribution of each variable to clustering. For variable selection we apply the sparse clustering framework and introduce different penalty parameters for the mixed-type variables. The screening consistency property of the DBI-SC method is also investigated. Extensive simulation studies demonstrate the satisfactory performance of the DBI-SC method in both clustering and variable selection. Finally, a designated driving service dataset is analyzed for customer segmentation using the proposed method.

Funding Statement

Dr. Yang Li (corresponding author) is supported by the National Natural Science Foundation of China (72271237) and Platform of Public Health & Disease Control and Prevention, Major Innovation & Planning Interdisciplinary Platform for the “Double-First Class” Initiative, Renmin University of China. Feifei Wang’s research is supported by National Natural Science Foundation of China (No. 72371241, 72171229), the MOE Project of Key Research Institute of Humanities and Social Sciences (22JJD910001), and Chinese National Statistical Science Research Project (2022LD06).

Acknowledgments

The authors would like to thank Mr. Qi Lu for his productive discussion.

Citation

Download Citation

Feifei Wang. Shaodong Xu. Yichen Qin. Ye Shen. Yang Li. "Sparse clustering for customer segmentation with high-dimensional mixed-type data." Ann. Appl. Stat. 18 (3) 2382 - 2402, September 2024. https://doi.org/10.1214/24-AOAS1886

Information

Received: 1 January 2023; Revised: 1 February 2024; Published: September 2024
First available in Project Euclid: 5 August 2024

Digital Object Identifier: 10.1214/24-AOAS1886

Keywords: Customer segmentation , Davies–Bouldin Index (DBI) , heterogeneity analysis , unsupervised learning , Variable selection

Rights: Copyright © 2024 Institute of Mathematical Statistics

JOURNAL ARTICLE
21 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.18 • No. 3 • September 2024
Back to Top