Abstract
Principal curves were introduced to formalize the notion of "a curve passing through the middle of a dataset." Vaguely speaking, a curve is said to pass through the middle of a dataset if every point on the curve is the average of the observations projecting onto it. This idea can be made precise by defining principal curves for probability densities. In this paper we study principal curves in the plane. Like linear principal components, principal curves are critical points of the expected squared distance from the data. However, the largest and smallest principal components are extrema of the distance, whereas all principal curves are saddle points. This explains why cross-validation does not appear to be a viable method for choosing the complexity of principal curve estimates.
Citation
Tom Duchamp. Werner Stuetzle. "Extremal properties of principal curves in the plane." Ann. Statist. 24 (4) 1511 - 1520, August 1996. https://doi.org/10.1214/aos/1032298280
Information