I have a dataframe 'df', from which I want to extract values and put in 2 different arrays that would be 3D arrays. Then I want to find minkowski distances between both array for whole sets of values in the dataset and append those (according to p_values) to the original data frame. But I'm not able to create function properly
my df looks like:
x1 y1 z1 x2 y2 z2
0 0.040928 0.250813 0.258730 0.050584 0.298290 0.273055
1 0.000000 0.174905 0.228518 0.011435 0.215528 0.233548
2 0.990905 0.746038 0.790401 0.972913 0.755414 0.822155
3 0.914052 0.669185 0.707238 0.922316 0.676172 0.734213
4 0.909504 0.480774 0.484074 0.915810 0.503221 0.489242
then I defined 2 arrays p1 and p2 as:
p1 = df[["x1", "y1", "z1"]].to_numpy()
p2 = df[["x2", "y2", "z2"]].to_numpy()
Now I want to calculate minkowski values for different values of p, between both arrays:
from math import sqrt
# calculate minkowski distance
def minkowski_distance(a, b, p):
return sum(abs(e1-e2)**p for e1, e2 in zip(a,b))**(1/p)
dist = minkowski_distance(p1,p2, 2)
dist
array([13.0317225 , 9.36364486, 7.56526207])
I want my resultant data frame to look like:
x1 y1 z1 x2 y2 z2 m(1) m(2) m(3) ...
where m(1) represents minkowski distance for p=1 and so on And all the rows of this data frame should correspond to the row value for which distance is to be calculated i.e.
(x1, y1, z1) <---------m--------> (x2,y2,z2)
You could try to calculate Minkowski distance in a vectorised way:
def minkowski_distance(a, b, p=2):
return np.sum(np.abs(a - b)**p, axis=1)**(1/p)
for p in range(1, 4):
df[f'm({p})'] = minkowski_distance(p1, p2, p)