Abstract
Given a pair of random vectors $(X,Y)$, we consider the problem of approximating $Y$ by $\mathbf{c}(X)=\{\mathbf{c}_{1}(X),\dots ,\mathbf{c}_{M}(X)\}$ where $\mathbf{c}$ is a measurable set-valued function. We give meaning to the approximation by using the principles of vector quantization which leads to the definition of a multifunction regression problem. The formulated problem amounts at quantizing the conditional distributions of $Y$ given $X$. We propose a nonparametric estimate of the solutions of the multifunction regression problem by combining the method of $M$-means clustering with the nonparametric smoothing technique of $k$-nearest neighbors. We provide an asymptotic analysis of the estimate and we derive a convergence rate for the excess risk of the estimate. The proposed methodology is illustrated on simulated examples and on a speed-flow traffic data set emanating from the context of road traffic forecasting.
Citation
Jean-Michel Loubes. Bruno Pelletier. "Prediction by quantization of a conditional distribution." Electron. J. Statist. 11 (1) 2679 - 2706, 2017. https://doi.org/10.1214/17-EJS1296
Information