Optimal cross-validation in density estimation with the $L^{2}$-loss

Alain Celisse

doi:10.1214/14-AOS1240

October 2014 Optimal cross-validation in density estimation with the $L^{2}$-loss

Alain Celisse

Ann. Statist. 42(5): 1879-1910 (October 2014). DOI: 10.1214/14-AOS1240

Abstract

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-$p$-out CV procedure (Lpo), where $p$ denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon $V$-fold cross-validation in terms of variability and computational complexity.

From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with $p=1$, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size $n$, optimality is achieved for $p$ large enough [with $p/n=o(1)$] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as $p/n$ is conveniently related to the rate of convergence of the best estimator in the collection: (i) $p/n\to1$ as $n\to+\infty$ with a parametric rate, and (ii) $p/n=o(1)$ with some nonparametric estimators. These theoretical results are validated by simulation experiments.

Citation

Download Citation

Alain Celisse. "Optimal cross-validation in density estimation with the $L^{2}$-loss." Ann. Statist. 42 (5) 1879 - 1910, October 2014. https://doi.org/10.1214/14-AOS1240

Information

Published: October 2014

First available in Project Euclid: 11 September 2014

zbMATH: 1305.62179

MathSciNet: MR3262471

Digital Object Identifier: 10.1214/14-AOS1240

Subjects:

Primary: 62G09

Secondary: 62E17 , 62G07

Keywords: Concentration inequalities , cross-validation , Density estimation , leave-$p$-out , Model selection , Oracle inequality , projection estimators , Resampling , risk estimation

Access the abstract

JOURNAL ARTICLE
32 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY