I wanted to calculate the distance between two 3D point clouds with at least 2000 points using Earth Mover's Distance with the following code, however, it is too slow and does not work properly. So, is there any way to calculate it for approximate it faster?
from scipy.spatial.distance import cdist
from scipy.optimize import linear_sum_assignment
def emd(self):
d = cdist(self.X, self.Y)
assignment = linear_sum_assignment(d)
return d[assignment].sum() / min(len(self.X), len(self.Y))
As of SciPy 1.4.0 (released December 2019), with the work of this pull request, the implementation of linear_sum_assignment
is drastically faster. In particular, 2000x2000 geometric instances can be solved in less than 10 seconds on a dated laptop; see some of the benchmarks in the pull request and related threads.
As of SciPy 1.6.0 (released December 2020), there's also scipy.sparse.csgraph.min_weight_full_bipartite_matching
, which can be used as a drop-in replacement for linear_sum_assignment
; as the name suggests, it's intended to be used with sparse inputs; yours is of course everything but sparse, and for geometric inputs it will generally be slower, but in some situations it's worth looking into as well.