Abstract
Given two probability measures, $\mathbb{P}$ and $\mathbb{Q}$ defined on a measurable space, $S$, the integral probability metric (IPM) is defined as $$\gamma_{\mathcal{F}}(\mathbb{P},\mathbb{Q})=\sup\left\{\left\vert \int_{S}f\,d\mathbb{P}-\int_{S}f\,d\mathbb{Q}\right\vert\,:\,f\in\mathcal{F}\right\},$$ where $\mathcal{F}$ is a class of real-valued bounded measurable functions on $S$. By appropriately choosing $\mathcal{F}$, various popular distances between $\mathbb{P}$ and $\mathbb{Q}$, including the Kantorovich metric, Fortet-Mourier metric, dual-bounded Lipschitz distance (also called the Dudley metric), total variation distance, and kernel distance, can be obtained.
In this paper, we consider the problem of estimating $\gamma_{\mathcal{F}}$ from finite random samples drawn i.i.d. from $\mathbb{P}$ and $\mathbb{Q}$. Although the above mentioned distances cannot be computed in closed form for every $\mathbb{P}$ and $\mathbb{Q}$, we show their empirical estimators to be easily computable, and strongly consistent (except for the total-variation distance). We further analyze their rates of convergence. Based on these results, we discuss the advantages of certain choices of $\mathcal{F}$ (and therefore the corresponding IPMs) over others—in particular, the kernel distance is shown to have three favorable properties compared with the other mentioned distances: it is computationally cheaper, the empirical estimate converges at a faster rate to the population value, and the rate of convergence is independent of the dimension $d$ of the space (for $S=\mathbb{R}^{d}$). We also provide a novel interpretation of IPMs and their empirical estimators by relating them to the problem of binary classification: while the IPM between class-conditional distributions is the negative of the optimal risk associated with a binary classifier, the smoothness of an appropriate binary classifier (e.g., support vector machine, Lipschitz classifier, etc.) is inversely related to the empirical estimator of the IPM between these class-conditional distributions.
Citation
Bharath K. Sriperumbudur. Kenji Fukumizu. Arthur Gretton. Bernhard Schölkopf. Gert R. G. Lanckriet. "On the empirical estimation of integral probability metrics." Electron. J. Statist. 6 1550 - 1599, 2012. https://doi.org/10.1214/12-EJS722
Information