Sparsity in penalized empirical risk minimization

Vladimir Koltchinskii

doi:10.1214/07-AIHP146

February 2009 Sparsity in penalized empirical risk minimization

Vladimir Koltchinskii

Ann. Inst. H. Poincaré Probab. Statist. 45(1): 7-57 (February 2009). DOI: 10.1214/07-AIHP146

Abstract

Let $(X, Y)$ be a random couple in $S×T$ with unknown distribution $P$. Let $(X_1, Y_1), …, (X_n, Y_n)$ be i.i.d. copies of $(X, Y)$, $P_n$ being their empirical distribution. Let $h_1, …, h_N:S↦[−1, 1]$ be a dictionary consisting of $N$ functions. For $λ∈ℝ^N$, denote $f_λ:=∑_{j=1}^Nλ_jh_j$. Let $ℓ:T×ℝ↦ℝ$ be a given loss function, which is convex with respect to the second variable. Denote $(ℓ•f)(x, y):=ℓ(y; f(x))$. We study the following penalized empirical risk minimization problem $$\hat{\lambda}^{\varepsilon }:=\mathop{\operatorname {argmin}}_{\lambda\in {\mathbb{R}}^{N}}\bigl[P_{n}(\ell\bullet f_{\lambda})+\varepsilon \|\lambda\|_{\ell_{p}}^{p}\bigr],$$ which is an empirical version of the problem $$\lambda^{\varepsilon }:=\mathop{\operatorname{argmin}}_{\lambda\in {\mathbb{R}}^{N}}\bigl[P(\ell \bullet f_{\lambda})+\varepsilon \|\lambda\|_{\ell_{p}}^{p}\bigr]$$ (here $\varepsilon≥0$ is a regularization parameter; $λ^0$ corresponds to $\varepsilon=0$). A number of regression and classification problems fit this general framework. We are interested in the case when $p≥1$, but it is close enough to 1 (so that $p−1$ is of the order $\frac{1}{\log N}$, or smaller). We show that the “sparsity” of $λ^\varepsilon$ implies the “sparsity” of $\hat{\lambda}^\varepsilon$ and study the impact of “sparsity” on bounding the excess risk $P(ℓ•f_{{\hat{\lambda}^\varepsilon}})−P(ℓ•f_{{λ^0}})$ of solutions of empirical risk minimization problems.

Soit $(X, Y)$ un couple aléatoire à valeurs dans $S×T$ et de loi $P$ inconnue. Soient $(X_1, Y_1), …, (X_n, Y_n)$ des répliques i.i.d. de $(X, Y)$, de loi empirique associée $P_n$. Soit $h_1, …, h_N:S↦[−1, 1]$ un dictionnaire composé de $N$ fonctions. Pour tout $λ∈ℝ^N$, on note $f_λ:=∑_{j=1}^Nλ_jh_j$. Soit $ℓ:T×ℝ↦ℝ$ fonction de perte donnée que l’on suppose convexe en la seconde variable. On note $(ℓ•f)(x, y):=ℓ(y;f(x))$. On étudie le problème de minimisation du risque empirique pénalisé suivant $$\hat{\lambda}^{\varepsilon }:=\mathop{\operatorname {argmin}}_{\lambda\in {\mathbb{R}}^{N}}\bigl[P_{n}(\ell\bullet f_{\lambda})+\varepsilon \|\lambda\|_{\ell_{p}}^{p}\bigr],$$ qui correspond à la version empirique du problème $$\lambda^{\varepsilon }:=\mathop{\operatorname{argmin}}_{\lambda\in {\mathbb{R}}^{N}}\bigl[P(\ell \bullet f_{\lambda})+\varepsilon \|\lambda\|_{\ell_{p}}^{p}\bigr]$$ (ici $\varepsilon≥0$ est un paramètre de régularisation; $λ^0$ correspond au cas $\varepsilon=0$). Ce cadre général englobe un certain nombre de problèmes de régression et de classification. On s’intéresse au cas où $p≥1$, mais reste proche de 1 (de sorte que $p−1$ soit de l’ordre $\frac{1}{\log N}$, ou inférieur). On montre que la “sparsité” de $λ^\varepsilon$ implique la “sparsité” de $\hat{\lambda}^\varepsilon$. En outre, on étudie les conséquences de la “sparsité” en termes de bornes supérieures sur l’excès de risque $P(ℓ•f_{\hat{\lambda}^\varepsilon})−P(ℓ•f_{λ^0})$ des solutions obtenues pour les différents problèmes de minimisation du risque empirique.

Citation

Download Citation

Vladimir Koltchinskii. "Sparsity in penalized empirical risk minimization." Ann. Inst. H. Poincaré Probab. Statist. 45 (1) 7 - 57, February 2009. https://doi.org/10.1214/07-AIHP146