Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects

David Donoho; Jiashun Jin

doi:10.1214/14-STS506

February 2015 Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects

David Donoho, Jiashun Jin

Statist. Sci. 30(1): 1-25 (February 2015). DOI: 10.1214/14-STS506

Abstract

In modern high-throughput data analysis, researchers perform a large number of statistical tests, expecting to find perhaps a small fraction of significant effects against a predominantly null background. Higher Criticism (HC) was introduced to determine whether there are any nonzero effects; more recently, it was applied to feature selection, where it provides a method for selecting useful predictive features from a large body of potentially useful features, among which only a rare few will prove truly useful.

In this article, we review the basics of HC in both the testing and feature selection settings. HC is a flexible idea, which adapts easily to new situations; we point out simple adaptions to clique detection and bivariate outlier detection. HC, although still early in its development, is seeing increasing interest from practitioners; we illustrate this with worked examples. HC is computationally effective, which gives it a nice leverage in the increasingly more relevant “Big Data” settings we see today.

We also review the underlying theoretical “ideology” behind HC. The Rare/Weak (RW) model is a theoretical framework simultaneously controlling the size and prevalence of useful/significant items among the useless/null bulk. The RW model shows that HC has important advantages over better known procedures such as False Discovery Rate (FDR) control and Family-wise Error control (FwER), in particular, certain optimality properties. We discuss the rare/weak phase diagram, a way to visualize clearly the class of RW settings where the true signals are so rare or so weak that detection and feature selection are simply impossible, and a way to understand the known optimality properties of HC.

Citation

Download Citation

David Donoho. Jiashun Jin. "Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects." Statist. Sci. 30 (1) 1 - 25, February 2015. https://doi.org/10.1214/14-STS506

Information

Published: February 2015

First available in Project Euclid: 4 March 2015

zbMATH: 1332.62019

MathSciNet: MR3317751

Digital Object Identifier: 10.1214/14-STS506

Keywords: ‎classification‎ , control of FDR , Feature selection , higher criticism , large covariance matrix , large-scale inference , phase diagram , rare and weak effects , sparse signal detection

Access the abstract

JOURNAL ARTICLE
25 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY