September 2023 A dynamic screening algorithm for hierarchical binary marketing data
Yimei Fan, Yuan Liao, Ilya O. Ryzhov, Kunpeng Zhang
Author Affiliations +
Ann. Appl. Stat. 17(3): 2326-2344 (September 2023). DOI: 10.1214/22-AOAS1720

Abstract

In many applications of business and marketing analytics, predictive models are fit using hierarchically structured data: common characteristics of products, customers, or web pages are represented as categorical variables, and each category can be split up into multiple subcategories at a lower level of the hierarchy. The model may thus contain hundreds of thousands of binary variables, necessitating the use of variable selection to screen out large numbers of irrelevant or insignificant features. We propose a new dynamic screening method, based on the distance correlation criterion, designed for hierarchical binary data. Our method can screen out large parts of the hierarchy at the higher levels, avoiding the need to explore many lower-level features and greatly reducing the computational cost of screening. The practical potential of the method is demonstrated in a case application on user-brand interaction data from Facebook.

Acknowledgments

The authors thank two anonymous referees, an associate editor and the editor for their constructive comments, which helped to improve this paper.

Citation

Download Citation

Yimei Fan. Yuan Liao. Ilya O. Ryzhov. Kunpeng Zhang. "A dynamic screening algorithm for hierarchical binary marketing data." Ann. Appl. Stat. 17 (3) 2326 - 2344, September 2023. https://doi.org/10.1214/22-AOAS1720

Information

Received: 1 April 2021; Revised: 1 September 2022; Published: September 2023
First available in Project Euclid: 7 September 2023

MathSciNet: MR4637669
Digital Object Identifier: 10.1214/22-AOAS1720

Keywords: Distance correlation , hierarchical data , sure screening

Rights: Copyright © 2023 Institute of Mathematical Statistics

Vol.17 • No. 3 • September 2023
Back to Top