Optimal properties of centroid-based classifiers for very high-dimensional data

Peter Hall; Tung Pham

doi:10.1214/09-AOS736

April 2010 Optimal properties of centroid-based classifiers for very high-dimensional data

Peter Hall, Tung Pham

Ann. Statist. 38(2): 1071-1093 (April 2010). DOI: 10.1214/09-AOS736

Abstract

We show that scale-adjusted versions of the centroid-based classifier enjoys optimal properties when used to discriminate between two very high-dimensional populations where the principal differences are in location. The scale adjustment removes the tendency of scale differences to confound differences in means. Certain other distance-based methods, for example, those founded on nearest-neighbor distance, do not have optimal performance in the sense that we propose. Our results permit varying degrees of sparsity and signal strength to be treated, and require only mild conditions on dependence of vector components. Additionally, we permit the marginal distributions of vector components to vary extensively. In addition to providing theory we explore numerical properties of a centroid-based classifier, and show that these features reflect theoretical accounts of performance.

Citation

Download Citation

Peter Hall. Tung Pham. "Optimal properties of centroid-based classifiers for very high-dimensional data." Ann. Statist. 38 (2) 1071 - 1093, April 2010. https://doi.org/10.1214/09-AOS736

Information

Published: April 2010

First available in Project Euclid: 19 February 2010

zbMATH: 1183.62104

MathSciNet: MR2604705

Digital Object Identifier: 10.1214/09-AOS736

Subjects:

Primary: 62H30

Keywords: Centroid method , ‎classification‎ , discrimination , distance-based classifiers , High-dimensional data , location differences , minimax performance , scale adjustment , Sparsity

Access the abstract

JOURNAL ARTICLE
23 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY