I need a tool to quickly calculate the Wasserstein distance between two two-dimensional point sets. I have been using Gudhi, but it appears to be too slow, and I need a faster alternative. I found the geomloss library, which appears to be fast enough, but the results differ, e.g.
from gudhi.wasserstein import wasserstein_distance
import numpy as np
dgm1 = np.array([[2.7, 3.7],[9.6, 14.],[34.2, 34.974]])
dgm2 = np.array([[2.8, 4.45],[9.5, 14.1]])
wasserstein_distance(dgm1, dgm2, order=1)
yields 1.2369999999999965
, while
import torch
from geomloss import SamplesLoss
I1 = torch.Tensor(dgm1)
I2 = torch.Tensor(dgm2)
I1.requires_grad_()
loss = SamplesLoss(loss='sinkhorn', debias=False, p=1, blur=1e-3, scaling=0.999, backend='auto')
loss(I1, I2)
yields tensor(12.9882, grad_fn=<SelectBackward0>)
. I don't expect the two results to match perfectly, but ten-fold difference is a bit too much.
I would highly appreciate if anyone could help me with either forcing the geomloss to yield result similar to the gudhi, or finding any alternative (that gives result similar to the gudhi).
I got comments from the developers on GitHub (please, see this issue for more information). Here is a short summary:
gudhi
calls wasserstein
is not the same as what geomloss
calls Wasserstein
, it seems to be discrepancy between OT and TDA communities terminology;gudhi
computes a non-standard variation (documented here) that is specifically tailored for persistence diagrams;gudhi
version of wasserstein
implemented in geomloss
but it may be possible to see it in the future.