Asymptotic comparison of (partial) cross-validation, GCV and randomized GCV in nonparametric regression

Didier A. Girard

doi:10.1214/aos/1030563988

February 1998 Asymptotic comparison of (partial) cross-validation, GCV and randomized GCV in nonparametric regression

Didier A. Girard

Ann. Statist. 26(1): 315-334 (February 1998). DOI: 10.1214/aos/1030563988

Abstract

When using nonparametric estimates of the mean curve, surface or image underlying noisy observations, the selection of "smoothing parameters" is generally crucial. This paper gives a theoretical comparison of the performances of generalized cross-validation (GCV) and of its fast randomized version (RGCV), as selection criteria. This is mainly done by studying the asymptotic distribution of the excess error for each selector, that is, the difference between the (data-driven) resulting average squared error (ASE) and the best possible ASE. We show here that, by using randomization, this distribution is dilated, as compared to that for CV or GCV, only by a factor always lower than $1 + 1/n_R$, where $n_R$ is the number of primary randomized trace estimates one uses in RGCV. We include in the compared selectors, the partial cross-validation (PCV) approach where only a fraction of all the possible "leave-one-out" validation tests are evaluated; so that PCV is a common practice to reduce the computational cost in many contexts. In this paper, PCV will in fact appear as quite inefficient as compared to RGCV from this computational point of view. Moreover, we show that a precise comparison (and interpretation of the gain of using $n_R \geq 2$) is possible in terms of equivalent (in distribution) excess errors, if PCV uses a certain percentage of the test points greater than 50%. The obtained comparisons will be seen as quite reassuring on what is "sacrificed" in using randomized selectors. We give rigorous results mainly for the kernel regression setting as in the previous detailed study by Härdle, Hall and Marron of standard selectors, except that we do not restrict this one to an equidistant design.

Citation

Download Citation

Didier A. Girard. "Asymptotic comparison of (partial) cross-validation, GCV and randomized GCV in nonparametric regression." Ann. Statist. 26 (1) 315 - 334, February 1998. https://doi.org/10.1214/aos/1030563988

Information

Published: February 1998

First available in Project Euclid: 28 August 2002

zbMATH: 0932.62047

MathSciNet: MR1608164

Digital Object Identifier: 10.1214/aos/1030563988

Subjects:

Primary: 62G07 , 62G20

Secondary: 62G09 , 62J07 , 65U05

Keywords: $C_L$ method , Bandwidth selection , cross-validation , fast randomized versions of GCV or $C_L$ , generalized cross-validation , Nonparametric regression , partial cross-validation , regularization

Access the abstract

JOURNAL ARTICLE
20 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY