I have a use case where I have a set of several thousand coordinates, and I want to vectorize them and turn them into distances. I want to do this in such a way that I end up with a 2D array, effectively a matrix, that is n x n, giving me the norm between the input points. I know that I'll have a pile of zeros along the diagonal, and that's fine. I want to process it as quickly as reasonably possible.
Currently my method is to take a numpy array of coordinates x,y,z is a row and the list is however many elements are loaded from a file, for instance 5000 rows.
I'm currently just looping through the list of coordinates as:
for i in range(n):
for j in range(n):
dist[i,j] = round(numpy.linalg.norm(coords[i] - coords[j]), 3)
dist is a numpy array setup with numpy.zeros((n,n))
where I've already got the n value, being the length of the coords list.
I know there must be a faster way to use numpy on this dataset, making coords an array of course, I am just not sure how to do this efficiently. Part of the reason I want to do this is I intend to use a truth table mask against this for data processing. Thanks!
So the solution is as easy as mentioned above, just import scipy, and use:
distances = scipy.spatial.distance.cdist(coords, coords)
The resulting array is the n by n array of the euclidean norms.