Open Access
VOL. 45 | 2004 Estimating gradient trees
Ming-Yen Cheng, Peter Hall, John A. Hartigan

Editor(s) Anirban DasGupta

IMS Lecture Notes Monogr. Ser., 2004: 237-249 (2004) DOI: 10.1214/lnms/1196285394

Abstract

With applications to cluster analysis in mind, we suggest new approaches to constructing tree diagrams that describe associations among points in a scatterplot. Our most basic tree diagram results in two data points being associated with one another if and only if their respective curves of steepest ascent up the density or intensity surface lead toward the same mode. The representation, in the sample space, of the set of steepest ascent curves corresponding to the data, is called the gradient tree. It has a regular, octopuslike structure, and is consistently estimated by its analogue computed from a nonparametric estimator which gives consistent estimation of both the density surface and its derivatives. We also suggest ‘forests’, in which data are linked by line segments which represent good approximations to portions of the population gradient tree. A forest is closely related to a minimum spanning tree, or MST, defined as the graph of minimum total length connecting all sample points. However, forests use a larger bandwidth for constructing the density-surface estimate than is implicit in the MST, with the result that they are substantially more orderly and are more readily interpreted. The effective bandwidth for the MST is so small that even the corresponding density-surface estimate, let alone its derivatives, is inconsistent. As a result, relationships that are suggested by the MST can change considerably if relatively small quantities of data are added or removed. Our trees and forests do not suffer from this problem. They are related to the concept of gradient traces, introduced by Wegman, Carr and Luo (1993) and Wegman and Carr (1993) for purposes quite different from our own.

Information

Published: 1 January 2004
First available in Project Euclid: 28 November 2007

zbMATH: 1268.62038
MathSciNet: MR2126901

Digital Object Identifier: 10.1214/lnms/1196285394

Subjects:
Primary: 62H30
Secondary: 62H20

Keywords: density ascent line , Density estimation , forest , gradient trace , minimum spanning tree , nearest neighbour methods , ridge estimation , tree diagram

Rights: Copyright © 2004, Institute of Mathematical Statistics

Vol. 45 • 1 January 2004
Back to Top