I have 2 numpy arrays (say X
and Y
) which each row represents a point vector.
I would like to find the squared euclidean distances (will call this 'dist') between each point in X to each point in Y.
I would like the output to be a matrix D where D(i,j)
is dist(X(i) , Y(j))
.
I have the following python code based on : http://nonconditional.com/2014/04/on-the-trick-for-computing-the-squared-euclidian-distances-between-two-sets-of-vectors/
def get_sq_distances(X, Y):
a = np.sum(np.square(X),axis=1,keepdims=1)
b = np.ones((1,Y.shape[0]))
c = a.dot(b)
a = np.ones((X.shape[0],1))
b = np.sum(np.square(Y),axis=1,keepdims=1).T
c += a.dot(b)
c -= 2*X.dot(Y.T)
return c
I'm trying to avoid loops (should I?) and to use matrix multiplication in order to do a fast computation.
But I have the problem with "Memory Error" on large arrays. Maybe there is a better way to do this?
Scipy has the cdist
function that does exactly what you want:
from scipy.spatial import distance
distance.cdist(X, Y, 'sqeuclidean')
The docs linked above have some good examples.