Search code examples
pythonnumpymultidimensional-arrayscipydistance

To find minkowski distance between 2 multidimensional arrays in python


I have a dataframe 'df', from which I want to extract values and put in 2 different arrays that would be 3D arrays. Then I want to find minkowski distances between both array for whole sets of values in the dataset and append those (according to p_values) to the original data frame. But I'm not able to create function properly

my df looks like:

    x1         y1       z1        x2        y2        z2
0  0.040928  0.250813  0.258730  0.050584  0.298290  0.273055
1  0.000000  0.174905  0.228518  0.011435  0.215528  0.233548
2  0.990905  0.746038  0.790401  0.972913  0.755414  0.822155
3  0.914052  0.669185  0.707238  0.922316  0.676172  0.734213
4  0.909504  0.480774  0.484074  0.915810  0.503221  0.489242

then I defined 2 arrays p1 and p2 as:

p1 = df[["x1", "y1", "z1"]].to_numpy() 
p2 = df[["x2", "y2", "z2"]].to_numpy() 

Now I want to calculate minkowski values for different values of p, between both arrays:

from math import sqrt
 
# calculate minkowski distance
def minkowski_distance(a, b, p):
    return sum(abs(e1-e2)**p for e1, e2 in zip(a,b))**(1/p)

dist = minkowski_distance(p1,p2, 2)
dist
array([13.0317225 ,  9.36364486,  7.56526207])

I want my resultant data frame to look like:

x1  y1  z1  x2  y2  z2  m(1)  m(2)  m(3) ...

where m(1) represents minkowski distance for p=1 and so on And all the rows of this data frame should correspond to the row value for which distance is to be calculated i.e.

(x1, y1, z1) <---------m--------> (x2,y2,z2)

Solution

  • You could try to calculate Minkowski distance in a vectorised way:

    def minkowski_distance(a, b, p=2):
        return np.sum(np.abs(a - b)**p, axis=1)**(1/p)
    
    for p in range(1, 4):
        df[f'm({p})'] = minkowski_distance(p1, p2, p)