Search code examples
pythonnumpymatrixscipyeuclidean-distance

Eucledian distance matrix between two matrices


I have the following function that calculates the eucledian distance between all combinations of the vectors in Matrix A and Matrix B

def distance_matrix(A,B):
    n=A.shape[1]
    m=B.shape[1]

    C=np.zeros((n,m))

    for ai, a in enumerate(A.T):
        for bi, b in enumerate(B.T): 
            C[ai][bi]=np.linalg.norm(a-b)
    return C

This works fine and creates an n*m-Matrix from a d*n-Matrix and a d*m-Matrix containing the eucledian distance between all combinations of the column vectors.

>>> print(A)
[[-1 -1  1  1  2]
 [ 1 -1  2 -1  1]] 
>>> print(B)
[[-2 -1  1  2]
 [-1  2  1 -1]]
>>> print(distance_matrix(A,B))
[[2.23606798 1.         2.         3.60555128]
 [1.         3.         2.82842712 3.        ]
 [4.24264069 2.         1.         3.16227766]
 [3.         3.60555128 2.         1.        ]
 [4.47213595 3.16227766 1.         2.        ]]

I spent some time looking for a numpy or scipy function to achieve this in a more efficient way. Is there such a function or what would be the vecotrized way to do this?


Solution

  • Yes, you can broadcast your vectors:

    A = np.array([[-1, -1,  1,  1,  2], [ 1, -1,  2, -1,  1]])
    B = np.array([[-2, -1,  1,  2], [-1,  2,  1, -1]])
    
    C = np.linalg.norm(A.T[:, None, :] - B.T[None, :, :], axis=-1)
    print(C)
    
    array([[2.23606798, 1.        , 2.        , 3.60555128],
           [1.        , 3.        , 2.82842712, 3.        ],
           [4.24264069, 2.        , 1.        , 3.16227766],
           [3.        , 3.60555128, 2.        , 1.        ],
           [4.47213595, 3.16227766, 1.        , 2.        ]])
    

    You can get an explanation of how it works here:

    https://sparrow.dev/pairwise-distance-in-numpy/