Electronic Journal of Statistics

Central limit theorems for entropy-regularized optimal transport on finite spaces and statistical applications

Abstract

The notion of entropy-regularized optimal transport, also known as Sinkhorn divergence, has recently gained popularity in machine learning and statistics, as it makes feasible the use of smoothed optimal transportation distances for data analysis. The Sinkhorn divergence allows the fast computation of an entropically regularized Wasserstein distance between two probability distributions supported on a finite metric space of (possibly) high-dimension. For data sampled from one or two unknown probability distributions, we derive the distributional limits of the empirical Sinkhorn divergence and its centered version (Sinkhorn loss). We also propose a bootstrap procedure which allows to obtain new test statistics for measuring the discrepancies between multivariate probability distributions. Our work is inspired by the results of Sommerfeld and Munk in [33] on the asymptotic distribution of empirical Wasserstein distance on finite space using unregularized transportation costs. Incidentally we also analyze the asymptotic distribution of entropy-regularized Wasserstein distances when the regularization parameter tends to zero. Simulated and real datasets are used to illustrate our approach.

Article information

Source
Electron. J. Statist., Volume 13, Number 2 (2019), 5120-5150.

Dates
First available in Project Euclid: 12 December 2019

https://projecteuclid.org/euclid.ejs/1576119711

Digital Object Identifier
doi:10.1214/19-EJS1637

Mathematical Reviews number (MathSciNet)
MR4041704

Zentralblatt MATH identifier
07147373

Citation

Bigot, Jérémie; Cazelles, Elsa; Papadakis, Nicolas. Central limit theorems for entropy-regularized optimal transport on finite spaces and statistical applications. Electron. J. Statist. 13 (2019), no. 2, 5120--5150. doi:10.1214/19-EJS1637. https://projecteuclid.org/euclid.ejs/1576119711

References

• [1] M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein GAN., arXiv preprint arXiv:1701.07875, 2017.
• [2] J. Bigot, E. Cazelles, and N. Papadakis. Data-driven regularization of wasserstein barycenters with an application to multivariate density registration., ArXiv e-prints, 1804.08962, 2018.
• [3] J. Bigot, R. Gouet, T. Klein, A. López, et al. Geodesic pca in the Wasserstein space by convex pca., Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 53(1):1–26, 2017.
• [4] S. Boyd and L. Vandenberghe., Convex Optimization. Cambridge University Press, 2004.
• [5] E. Cazelles, V. Seguy, J. Bigot, M. Cuturi, and N. Papadakis. Geodesic pca versus log-pca of histograms in the wasserstein space., SIAM Journal on Scientific Computing, 40(2):B429–B456, 2018.
• [6] Q. Chen and Z. Fang. Inference on functionals under first order degeneracy., SSRN, 2018.
• [7] M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In, Advances in Neural Information Processing Systems 26, pages 2292–2300. 2013.
• [8] M. Cuturi and A. Doucet. Fast computation of Wasserstein barycenters. In, International Conference on Machine Learning 2014, JMLR W&CP, volume 32, pages 685–693, 2014.
• [9] M. Cuturi and G. Peyré. A smoothed dual approach for variational Wasserstein problems., SIAM Journal on Imaging Sciences, 9(1):320–343, 2016.
• [10] E. del Barrio, J. A. Cuesta-Albertos, C. Matrán, and J. M. Rodriguez-Rodriguez. Tests of goodness of fit based on the $L_2$-Wasserstein distance., Ann. Statist., 27(4) :1230–1239, 1999.
• [11] E. del Barrio, E. Giné, and F. Utzet. Asymptotics for $L_2$ functionals of the empirical quantile process, with applications to tests of fit based on weighted Wasserstein distances., Bernoulli, 11(1):131–189, 2005.
• [12] E. del Barrio and J.-M. Loubes. Central limit theorems for empirical transportation cost in general dimension., arXiv:1705.01299v1, 2017.
• [13] B. Efron and R. J. Tibshirani., An Introduction to the Bootstrap. Chapman & Hall, New York, 1993.
• [14] J. Feydy, T. Séjourné, F.-X. Vialard, S.-I. Amari, A. Trouvé, and G. Peyré. Interpolating between optimal transport and MMD using Sinkhorn divergences., arXiv preprint arXiv:1810.08278, 2018.
• [15] G. Freitag and A. Munk. On Hadamard differentiability in $k$-sample semiparametric models—with applications to the assessment of structural relationships., J. Multivariate Anal., 94(1):123–158, 2005.
• [16] C. Frogner, C. Zhang, H. Mobahi, M. Araya, and T. A. Poggio. Learning with a Wasserstein loss. In, Advances in Neural Information Processing Systems, pages 2053–2061, 2015.
• [17] A. Genevay, L. Chizat, F. Bach, M. Cuturi, and G. Peyré. Sample complexity of Sinkhorn divergences., arXiv preprint arXiv:1810.02733, 2018.
• [18] A. Genevay, M. Cuturi, G. Peyré, and F. Bach. Stochastic optimization for large-scale optimal transport. In D. D. Lee, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Proc. NIPS’16, pages 3432–3440. Curran Associates, Inc., 2016.
• [19] A. Genevay, G. Peyré, and M. Cuturi. Sinkhorn-autodiff: Tractable Wasserstein learning of generative models., arXiv preprint 1706.00292, 2017.
• [20] A. Gramfort, G. Peyré, and M. Cuturi. Fast optimal transport averaging of neuroimaging data. In, International Conference on Information Processing in Medical Imaging, pages 261–272. Springer, 2015.
• [21] M. Klatt, C. Tameling, and A. Munk. Empirical regularized optimal transport: Statistical theory and applications., arXiv preprint arXiv:1810.09880, 2018.
• [22] T.-T. Lu and S.-H. Shiou. Inverses of 2 $\times$ 2 block matrices., Computers & Mathematics with Applications, 43(1-2):119–129, 2002.
• [23] G. Luise, A. Rudi, M. Pontil, and C. Ciliberto. Differential properties of Sinkhorn approximation for learning with Wasserstein distance. In, Advances in Neural Information Processing Systems, pages 5859–5870, 2018.
• [24] X. Nguyen. Convergence of latent mixing measures in finite and infinite mixture models., Ann. Statist., 41(1):370–400, 02 2013.
• [25] A. Olmos and F. A. Kingdom. A biologically inspired algorithm for the recovery of shading and reflectance images., Perception, 33(12) :1463–1473, 2004.
• [26] J. Rabin and N. Papadakis. Convex color image segmentation with optimal transport distances. In, International Conference on Scale Space and Variational Methods in Computer Vision, pages 256–269. Springer, 2015.
• [27] A. Ramdas, N. G. Trillos, and M. Cuturi. On wasserstein two-sample testing and related families of nonparametric tests., Entropy, 19(2):47, 2017.
• [28] T. Rippl, A. Munk, and A. Sturm. Limit laws of the empirical Wasserstein distance: Gaussian distributions., J. Multivariate Anal., 151:90–109, 2016.
• [29] A. Rolet, M. Cuturi, and G. Peyré. Fast dictionary learning with a smoothed Wasserstein loss. In, Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.
• [30] M. A. Schmitz, M. Heitz, N. Bonneel, F. M. N. Mboula, D. Coeurjolly, M. Cuturi, G. Peyré, and J.-L. Starck. Wasserstein dictionary learning: Optimal transport-based unsupervised non-linear dictionary learning., arXiv preprint arXiv:1708.01955, 2017.
• [31] V. Seguy and M. Cuturi. Principal geodesic analysis for probability measures under the optimal transport metric. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pages 3294–3302. Curran Associates, Inc., 2015.
• [32] J. Solomon, F. de Goes, G. Peyré, M. Cuturi, A. Butscher, A. Nguyen, T. Du, and L. Guibas. Convolutional wasserstein distances: Efficient optimal transportation on geometric domains. In, ACM Transactions on Graphics (SIGGRAPH’15), 2015.
• [33] M. Sommerfeld and A. Munk. Inference for empirical wasserstein distances on finite spaces., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2016.
• [34] J. Sourati, M. Akcakaya, T. K. Leen, D. Erdogmus, and J. G. Dy. Asymptotic analysis of objectives based on Fisher information in active learning., Journal of Machine Learning Research, 18(34):1–41, 2017.
• [35] A. Thibault, L. Chizat, C. Dossal, and N. Papadakis. Overrelaxed Sinkhorn-Knopp algorithm for regularized optimal transport., arXiv preprint arXiv:1711.01851, 2017.
• [36] A. W. Van Der Vaart and J. A. Wellner., Weak Convergence and Empirical Processes. Springer, 1996.
• [37] C. Villani., Topics in Optimal Transportation, volume 58 of Graduate Studies in Mathematics. American Mathematical Society, 2003.
• [38] L. Wasserman., All of statistics: a concise course in statistical inference. Springer Science & Business Media, 2011.
• [39] A. G. Wilson. The use of entropy maximising models, in the theory of trip distribution, mode split and route split., Journal of Transport Economics and Policy, pages 108–126, 1969.
• [40] J. Ye, P. Wu, J. Z. Wang, and J. Li. Fast discrete distribution clustering using Wasserstein barycenter with sparse support., IEEE Trans. Signal Processing, 65(9) :2317–2332, 2017.
• [41] C. Zalinescu., Convex analysis in general vector spaces. World Scientific, 2002.