Abstract
Given a finite class of functions $F$, the problem of aggregation is to construct a procedure with a risk as close as possible to the risk of the best element in the class. A classical procedure (PAC-Bayesian statistical learning theory (2004) Paris 6, Statistical Learning Theory and Stochastic Optimization (2001) Springer, Ann. Statist. 28 (2000) 75–87) is the aggregate with exponential weights (AEW), defined by
\[\tilde{f}^{\mathrm{AEW}}=\sum_{f\in F}\widehat{\theta}(f)f,\qquad\mbox{where }\widehat{\theta}(f)=\frac{\exp(-({n}/{T})R_{n}(f))}{\sum_{g\in F}\exp(-({n}/{T})R_{n}(g))},\]
where $T>0$ is called the temperature parameter and $R_{n}(\cdot)$ is an empirical risk.
In this article, we study the optimality of the AEW in the regression model with random design and in the low-temperature regime. We prove three properties of AEW. First, we show that AEW is a suboptimal aggregation procedure in expectation with respect to the quadratic risk when $T\leq c_{1}$, where $c_{1}$ is an absolute positive constant (the low-temperature regime), and that it is suboptimal in probability even for high temperatures. Second, we show that as the cardinality of the dictionary grows, the behavior of AEW might deteriorate, namely, that in the low-temperature regime it might concentrate with high probability around elements in the dictionary with risk greater than the risk of the best function in the dictionary by at least an order of $1/\sqrt{n}$. Third, we prove that if a geometric condition on the dictionary (the so-called “Bernstein condition”) is assumed, then AEW is indeed optimal both in high probability and in expectation in the low-temperature regime. Moreover, under that assumption, the complexity term is essentially the logarithm of the cardinality of the set of “almost minimizers” rather than the logarithm of the cardinality of the entire dictionary. This result holds for small values of the temperature parameter, thus complementing an analogous result for high temperatures.
Citation
Guillaume Lecué. Shahar Mendelson. "On the optimality of the aggregate with exponential weights for low temperatures." Bernoulli 19 (2) 646 - 675, May 2013. https://doi.org/10.3150/11-BEJ408
Information