Hierarchically associated covariates are common in many fields, and it is often of interest to incorporate their information in statistical inference. This paper proposes a novel way to explicitly integrate the information of a given hierarchical tree of covariates in high-dimensional model selection. Specifically, a set of hierarchical scores is introduced to quantify the hierarchical positions of the terminal nodes of the given hierarchical tree, where a terminal node represents either a single covariate or a group of covariates. These scores are then used to weight the corresponding penalty terms in a model selection approach. We show that the proposed estimation approach has a hierarchical grouping property, namely, two highly correlated covariates that are close to each other in the hierarchical tree will be more likely included or excluded together in the model than those which are far away. We also prove model selection consistency of the proposed estimator both between and within groups. The theoretical results are illustrated by simulation and also a real data analysis on the Systemic Lupus Erythematosus (SLE) dataset.
"Model selection of hierarchically structured covariates using elastic net." Electron. J. Statist. 10 (2) 3775 - 3806, 2016. https://doi.org/10.1214/16-EJS1217