Feature selection in omics prediction problems using cat scores and false nondiscovery rate control

Miika Ahdesmäki; Korbinian Strimmer

doi:10.1214/09-AOAS277

March 2010 Feature selection in omics prediction problems using cat scores and false nondiscovery rate control

Miika Ahdesmäki, Korbinian Strimmer

Ann. Appl. Stat. 4(1): 503-519 (March 2010). DOI: 10.1214/09-AOAS277

Abstract

We revisit the problem of feature selection in linear discriminant analysis (LDA), that is, when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative weights of Mahalanobis-transformed predictors are given by correlation-adjusted t-scores (cat scores). Second, for feature selection we propose thresholding cat scores by controlling false nondiscovery rates (FNDR). Third, training of the classifier is based on James–Stein shrinkage estimates of correlations and variances, where regularization parameters are chosen analytically without resampling. Overall, this results in an effective and computationally inexpensive framework for high-dimensional prediction with natural feature selection. The proposed shrinkage discriminant procedures are implemented in the R package “sda” available from the R repository CRAN.

Citation

Download Citation

Miika Ahdesmäki. Korbinian Strimmer. "Feature selection in omics prediction problems using cat scores and false nondiscovery rate control." Ann. Appl. Stat. 4 (1) 503 - 519, March 2010. https://doi.org/10.1214/09-AOAS277

Information

Published: March 2010

First available in Project Euclid: 11 May 2010

zbMATH: 1189.62102

MathSciNet: MR2758182

Digital Object Identifier: 10.1214/09-AOAS277

Keywords: “small n, large p” setting , Correlation , correlation-adjusted t-score , False discovery rates , Feature selection , higher criticism , James–Stein estimator , linear discriminant analysis