## The Annals of Statistics

### Cross-validation in nonparametric regression with outliers

Denis Heng-Yan Leung

#### Abstract

A popular data-driven method for choosing the bandwidth in standard kernel regression is cross-validation. Even when there are outliers in the data, robust kernel regression can be used to estimate the unknown regression curve [Robust and Nonlinear Time Series Analysis. Lecture Notes in Statist. (1984) 26 163–184]. However, under these circumstances standard cross-validation is no longer a satisfactory bandwidth selector because it is unduly influenced by extreme prediction errors caused by the existence of these outliers. A more robust method proposed here is a cross-validation method that discounts the extreme prediction errors. In large samples the robust method chooses consistent bandwidths, and the consistency of the method is practically independent of the form in which extreme prediction errors are discounted. Additionally, evaluation of the method’s finite sample behavior in a simulation demonstrates that the proposed method performs favorably. This method can also be applied to other problems, for example, model selection, that require cross-validation.

#### Article information

Source
Ann. Statist., Volume 33, Number 5 (2005), 2291-2310.

Dates
First available in Project Euclid: 25 November 2005

https://projecteuclid.org/euclid.aos/1132936564

Digital Object Identifier
doi:10.1214/009053605000000499

Mathematical Reviews number (MathSciNet)
MR2211087

Zentralblatt MATH identifier
1086.62055

#### Citation

Leung, Denis Heng-Yan. Cross-validation in nonparametric regression with outliers. Ann. Statist. 33 (2005), no. 5, 2291--2310. doi:10.1214/009053605000000499. https://projecteuclid.org/euclid.aos/1132936564

