Open Access
2019 Hybrid Wasserstein distance and fast distribution clustering
Isabella Verdinelli, Larry Wasserman
Electron. J. Statist. 13(2): 5088-5119 (2019). DOI: 10.1214/19-EJS1639

Abstract

We define a modified Wasserstein distance for distribution clustering which inherits many of the properties of the Wasserstein distance but which can be estimated easily and computed quickly. The modified distance is the sum of two terms. The first term — which has a closed form — measures the location-scale differences between the distributions. The second term is an approximation that measures the remaining distance after accounting for location-scale differences. We consider several forms of approximation with our main emphasis being a tangent space approximation that can be estimated using nonparametric regression and leads to fast and easy computation of barycenters which otherwise would be very difficult to compute. We evaluate the strengths and weaknesses of this approach on simulated and real examples.

Citation

Download Citation

Isabella Verdinelli. Larry Wasserman. "Hybrid Wasserstein distance and fast distribution clustering." Electron. J. Statist. 13 (2) 5088 - 5119, 2019. https://doi.org/10.1214/19-EJS1639

Information

Received: 1 December 2018; Published: 2019
First available in Project Euclid: 12 December 2019

zbMATH: 07147372
MathSciNet: MR4041703
Digital Object Identifier: 10.1214/19-EJS1639

Subjects:
Primary: 62G99
Secondary: 62H30

Keywords: clustering , Wasserstein

Vol.13 • No. 2 • 2019
Back to Top