Open Access
February 2024 StarTrek: Combinatorial variable selection with false discovery rate control
Lu Zhang, Junwei Lu
Author Affiliations +
Ann. Statist. 52(1): 78-102 (February 2024). DOI: 10.1214/23-AOS2296

Abstract

Variable selection on the large-scale networks has been extensively studied in the literature. While most of the existing methods are limited to the local functionals especially the graph edges, this paper focuses on selecting the discrete hub structures of the networks. Specifically, we propose an inferential method, called StarTrek filter, to select the hub nodes with degrees larger than a certain thresholding level in the high-dimensional graphical models and control the false discovery rate (FDR). Discovering hub nodes in the networks is challenging: there is no straightforward statistic for testing the degree of a node due to the combinatorial structures; complicated dependence in the multiple testing problem is hard to characterize and control. In methodology, the StarTrek filter overcomes this by constructing p-values based on the maximum test statistics via the Gaussian multiplier bootstrap. In theory, we show that the StarTrek filter can control the FDR by providing accurate bounds on the approximation errors of the quantile estimation and addressing the dependence structures among the maximal statistics.

To this end, we establish novel Cramér-type comparison bounds for the high-dimensional Gaussian random vectors. Compared to the Gaussian comparison bound via the Kolmogorov distance established by Chernozhukov, Chetverikov and Kato (Ann. Statist. 42 (2014) 1787–1818), our Cramér-type comparison bounds establish the relative difference between the distribution functions of two high-dimensional Gaussian random vectors, which is essential in the theoretical analysis of FDR control. Moreover, the StarTrek filter can be applied to general statistical models for FDR control of discovering discrete structures such as simultaneously testing the sparsity levels of multiple high-dimensional linear models. We illustrate the validity of the StarTrek filter in a series of numerical experiments and apply it to the genotype-tissue expression dataset to discover central regulator genes.

Funding Statement

The authors are grateful for the support of NSF DMS-1916211, NIH R35 CA220523, NIH R01 ES32418 and NIH U01CA209414.

Citation

Download Citation

Lu Zhang. Junwei Lu. "StarTrek: Combinatorial variable selection with false discovery rate control." Ann. Statist. 52 (1) 78 - 102, February 2024. https://doi.org/10.1214/23-AOS2296

Information

Received: 1 September 2021; Revised: 1 December 2022; Published: February 2024
First available in Project Euclid: 7 March 2024

MathSciNet: MR4718408
Digital Object Identifier: 10.1214/23-AOS2296

Subjects:
Primary: 62H15 , 62H22
Secondary: 60F99

Keywords: combinatorial inference , comparison bounds , false discovery rate control , Gaussian multiplier bootstrap , graphical models , multiple testing

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.52 • No. 1 • February 2024
Back to Top