Estimator selection in the Gaussian setting

Yannick Baraud; Christophe Giraud; Sylvie Huet

doi:10.1214/13-AIHP539

August 2014 Estimator selection in the Gaussian setting

Yannick Baraud, Christophe Giraud, Sylvie Huet

Ann. Inst. H. Poincaré Probab. Statist. 50(3): 1092-1119 (August 2014). DOI: 10.1214/13-AIHP539

Abstract

We consider the problem of estimating the mean $f$ of a Gaussian vector $Y$ with independent components of common unknown variance $\sigma^{2}$. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection $\mathbb{F}$ of estimators of $f$ based on $Y$ and, with the same data $Y$, aim at selecting an estimator among $\mathbb{F}$ with the smallest Euclidean risk. No assumptions on the estimators are made and their dependencies with respect to $Y$ may be unknown. We establish a non-asymptotic risk bound for the selected estimator and derive oracle-type inequalities when $\mathbb{F}$ consists of linear estimators. As particular cases, our approach allows to handle the problems of aggregation, model selection as well as those of choosing a window and a kernel for estimating a regression function, or tuning the parameter involved in a penalized criterion. In all theses cases but aggregation, the method can be easily implemented. For illustration, we carry out two simulation studies. One aims at comparing our procedure to cross-validation for choosing a tuning parameter. The other shows how to implement our approach to solve the problem of variable selection in practice.

Nous présentons une nouvelle procédure de sélection d’estimateurs pour estimer l’espérance $f$ d’un vecteur $Y$ de $n$ variables gaussiennes indépendantes dont la variance est inconnue. Nous proposons de choisir un estimateur de $f$, dont l’objectif est de minimiser le risque $l_{2}$, dans une collection arbitraire et éventuellement infinie $\mathbb{F}$ d’estimateurs. La procédure de choix ainsi que la collection $\mathbb{F}$ ne dépendent que des seules observations $Y$. Nous calculons une borne de risque, non asymptotique, ne nécessitant aucune hypothèse sur les estimateurs dans $\mathbb{F}$, ni la connaissance de leur dépendance en $Y$. Nous calculons des inégalités de type “oracle” quand $\mathbb{F}$ est une collection d’estimateurs linéaires. Nous considérons plusieurs cas particuliers : estimation par aggrégation, estimation par sélection de modèles, choix d’une fenêtre et du paramètre de lissage en régression fonctionnelle, choix du paramètre de régularisation dans un critère pénalisé. Pour tous ces cas particuliers, sauf pour les méthodes d’aggrégation, la méthode est très facile à programmer. A titre d’illustration nous montrons des résultats de simulations avec deux objectifs : comparer notre méthode à la procédure de cross-validation, montrer comment la mettre en œuvre dans le cadre de la sélection de variables.

Citation

Download Citation

Yannick Baraud. Christophe Giraud. Sylvie Huet. "Estimator selection in the Gaussian setting." Ann. Inst. H. Poincaré Probab. Statist. 50 (3) 1092 - 1119, August 2014. https://doi.org/10.1214/13-AIHP539

Information

Published: August 2014

First available in Project Euclid: 20 June 2014

zbMATH: 1298.62113

MathSciNet: MR3224300

Digital Object Identifier: 10.1214/13-AIHP539

Subjects:

Primary: 62G08 , 62J05 , 62J07

Keywords: Elastic net , Estimator selection , Kernel estimator , Lasso , Linear estimator , Model selection , PLS1 regression , Random forest , Ridge regression , Variable selection

Access the abstract

JOURNAL ARTICLE
28 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY