Search code examples
pythonnumpymatrixeuclidean-distanceadjacency-matrix

Adjacency Matrix from Numpy array using Euclidean Distance


Can someone help me please on how to generate a weighted adjacency matrix from a numpy array based on euclidean distance between all rows, i.e 0 and 1, 0 and 2,.. 1 and 2,...?

Given the following example with an input matrix(5, 4):

matrix = [[2,10,9,6],
          [5,1,4,7],
          [3,2,1,0], 
          [10, 20, 1, 4], 
          [17, 3, 5, 18]]

I would like to obtain a weighted adjacency matrix (5,5) containing the most minimal distance between nodes, i.e,

if dist(row0, row1)= 10,77 and dist(row0, row2)= 12,84, 

--> the output matrix will take the first distance as a column value. 

I have already solved the first part for the generation of the adjacency matrix with the following code :

from scipy.spatial.distance import cdist
dist = cdist( matrix, matrix, metric='euclidean')

and I get the following result :

array([[ 0.        , 10.77032961, 12.84523258, 15.23154621, 20.83266666],
       [10.77032961,  0.        ,  7.93725393, 20.09975124, 16.43167673],
       [12.84523258,  7.93725393,  0.        , 19.72308292, 23.17326045],
       [15.23154621, 20.09975124, 19.72308292,  0.        , 23.4520788 ],
       [20.83266666, 16.43167673, 23.17326045, 23.4520788 ,  0.        ]])

But I don't know yet how to specify the number of neighbors for which we select for example 2 neighbors for each node. For example, we define the number of neighbors N = 2, then for each row, we choose only two neighbors with the two minimum distances and we get as a result :

[[ 0.        , 10.77032961, 12.84523258, 0, 0],
       [10.77032961,  0.        ,  7.93725393, 0, 0],
       [12.84523258,  7.93725393,  0.        , 0, 0],
       [15.23154621, 0, 19.72308292,  0.        , 0 ],
       [20.83266666, 16.43167673, 0, 0 ,  0.        ]]

Solution

  • You can use this cleaner solution to get the smallest n from a matrix. Try the following -

    The dist.argsort(1).argsort(1) creates a rank order (smallest is 0 and largest is 4) over axis=1 and the <= 2 decided the number of nsmallest values you need from the rank order. np.where filters it or replaces it with 0.

    np.where(dist.argsort(1).argsort(1) <= 2, dist, 0)
    
    array([[ 0.        , 10.77032961, 12.84523258,  0.        ,  0.        ],
           [10.77032961,  0.        ,  7.93725393,  0.        ,  0.        ],
           [12.84523258,  7.93725393,  0.        ,  0.        ,  0.        ],
           [15.23154621,  0.        , 19.72308292,  0.        ,  0.        ],
           [20.83266666, 16.43167673,  0.        ,  0.        ,  0.        ]])
    

    This works for any axis or if you want nlargest or nsmallest from a matrix as well.