Search code examples
pythonarrayslistnumpytruthtable

Python numpy Quick Array Calculations to 2D Array (matrix)


I have a use case where I have a set of several thousand coordinates, and I want to vectorize them and turn them into distances. I want to do this in such a way that I end up with a 2D array, effectively a matrix, that is n x n, giving me the norm between the input points. I know that I'll have a pile of zeros along the diagonal, and that's fine. I want to process it as quickly as reasonably possible.

Currently my method is to take a numpy array of coordinates x,y,z is a row and the list is however many elements are loaded from a file, for instance 5000 rows.

I'm currently just looping through the list of coordinates as:

for i in range(n):
    for j in range(n):
        dist[i,j] = round(numpy.linalg.norm(coords[i] - coords[j]), 3)

dist is a numpy array setup with numpy.zeros((n,n)) where I've already got the n value, being the length of the coords list.

I know there must be a faster way to use numpy on this dataset, making coords an array of course, I am just not sure how to do this efficiently. Part of the reason I want to do this is I intend to use a truth table mask against this for data processing. Thanks!


Solution

  • So the solution is as easy as mentioned above, just import scipy, and use:

    distances = scipy.spatial.distance.cdist(coords, coords)
    

    The resulting array is the n by n array of the euclidean norms.