The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 8, Number 3 (2014), 1469-1491.
Rank discriminants for predicting phenotypes from RNA expression
Bahman Afsari, Ulisses M. Braga-Neto, and Donald Geman
Abstract
Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes and predicting clinical outcomes. Still, clinical applications remain scarce. One reason is that the complexity of the decision rules that emerge from standard statistical learning impedes biological understanding, in particular, any mechanistic interpretation. Here we explore decision rules for binary classification utilizing only the ordering of expression among several genes; the basic building blocks are then two-gene expression comparisons. The simplest example, just one comparison, is the TSP classifier, which has appeared in a variety of cancer-related discovery studies. Decision rules based on multiple comparisons can better accommodate class heterogeneity, and thereby increase accuracy, and might provide a link with biological mechanism. We consider a general framework (“rank-in-context”) for designing discriminant functions, including a data-driven selection of the number and identity of the genes in the support (“context”). We then specialize to two examples: voting among several pairs and comparing the median expression in two groups of genes. Comprehensive experiments assess accuracy relative to other, more complex, methods, and reinforce earlier observations that simple classifiers are competitive.
Article information
Source
Ann. Appl. Stat. Volume 8, Number 3 (2014), 1469-1491.
Dates
First available in Project Euclid: 23 October 2014
Permanent link to this document
http://projecteuclid.org/euclid.aoas/1414091221
Digital Object Identifier
doi:10.1214/14-AOAS738
Mathematical Reviews number (MathSciNet)
MR3271340
Zentralblatt MATH identifier
1304.62131
Keywords
Cancer classification gene expression rank discriminant order statistics
Citation
Afsari, Bahman; Braga-Neto, Ulisses M.; Geman, Donald. Rank discriminants for predicting phenotypes from RNA expression. Ann. Appl. Stat. 8 (2014), no. 3, 1469--1491. doi:10.1214/14-AOAS738. http://projecteuclid.org/euclid.aoas/1414091221.
Supplemental materials
- Supplementary material A: Proposition S1. We provide the statement and proof of Proposition S1 as well as statistical tests for the assumptions made in Proposition S1.Digital Object Identifier: doi:10.1214/14-AOAS738SUPPASupplemental files available for subscribers.
- Supplementary material B: Notch-plots for classification accuracies. We provide notch-plots of the estimates of classification accuracy for every method and every data set based on ten runs of tenfold cross-validation.Digital Object Identifier: doi:10.1214/14-AOAS738SUPPBSupplemental files available for subscribers.
- Supplementary material C: Algorithms for KTSP and TSM. We provide a summary of the algorithms for learning the KTSP and TSM classifiers.Digital Object Identifier: doi:10.1214/14-AOAS738SUPPCSupplemental files available for subscribers.

