Open Access
May 2013 On the optimality of the aggregate with exponential weights for low temperatures
Guillaume Lecué, Shahar Mendelson
Bernoulli 19(2): 646-675 (May 2013). DOI: 10.3150/11-BEJ408

Abstract

Given a finite class of functions $F$, the problem of aggregation is to construct a procedure with a risk as close as possible to the risk of the best element in the class. A classical procedure (PAC-Bayesian statistical learning theory (2004) Paris 6, Statistical Learning Theory and Stochastic Optimization (2001) Springer, Ann. Statist. 28 (2000) 75–87) is the aggregate with exponential weights (AEW), defined by

\[\tilde{f}^{\mathrm{AEW}}=\sum_{f\in F}\widehat{\theta}(f)f,\qquad\mbox{where }\widehat{\theta}(f)=\frac{\exp(-({n}/{T})R_{n}(f))}{\sum_{g\in F}\exp(-({n}/{T})R_{n}(g))},\]

where $T>0$ is called the temperature parameter and $R_{n}(\cdot)$ is an empirical risk.

In this article, we study the optimality of the AEW in the regression model with random design and in the low-temperature regime. We prove three properties of AEW. First, we show that AEW is a suboptimal aggregation procedure in expectation with respect to the quadratic risk when $T\leq c_{1}$, where $c_{1}$ is an absolute positive constant (the low-temperature regime), and that it is suboptimal in probability even for high temperatures. Second, we show that as the cardinality of the dictionary grows, the behavior of AEW might deteriorate, namely, that in the low-temperature regime it might concentrate with high probability around elements in the dictionary with risk greater than the risk of the best function in the dictionary by at least an order of $1/\sqrt{n}$. Third, we prove that if a geometric condition on the dictionary (the so-called “Bernstein condition”) is assumed, then AEW is indeed optimal both in high probability and in expectation in the low-temperature regime. Moreover, under that assumption, the complexity term is essentially the logarithm of the cardinality of the set of “almost minimizers” rather than the logarithm of the cardinality of the entire dictionary. This result holds for small values of the temperature parameter, thus complementing an analogous result for high temperatures.

Citation

Download Citation

Guillaume Lecué. Shahar Mendelson. "On the optimality of the aggregate with exponential weights for low temperatures." Bernoulli 19 (2) 646 - 675, May 2013. https://doi.org/10.3150/11-BEJ408

Information

Published: May 2013
First available in Project Euclid: 13 March 2013

zbMATH: 06168767
MathSciNet: MR3037168
Digital Object Identifier: 10.3150/11-BEJ408

Keywords: Aggregation , empirical process , Gaussian approximation , Gibbs estimators

Rights: Copyright © 2013 Bernoulli Society for Mathematical Statistics and Probability

Vol.19 • No. 2 • May 2013
Back to Top