The 7 columns for each row in df_centroids show the coordinates in a 7 dimensional space.
import numpy as np
import pandas as pd
import scipy
df_centroids
0 1 2 3 4 5 6
0 2.443664 -0.158806 -0.403137 0.609063 -0.412371 -0.486611 -0.687598
1 -0.389052 1.258986 -0.517471 -0.127748 0.379712 -0.486611 -0.143564
2 -0.215555 0.201088 1.149816 -0.501471 0.275600 -0.088475 1.434132
3 -0.227075 -0.806379 -0.412111 -0.174150 -0.417327 -0.401676 -0.234962
4 -0.130615 0.197548 1.282325 -0.940454 0.161774 2.167632 -0.263252
5 0.015202 -0.125552 -0.665733 1.792274 -0.360096 -0.390093 -0.044649
I'm trying to calculate the Euclidean distance from origin and save it under 'Euclidean Distance' column. Please see code below:
df_centroids['Euclidean Distance']=''
from scipy.spatial import distance
i=0
while i<len(df_centroids.index):
centroid=[df_centroids.iloc[i,0], df_centroids.iloc[i,1], df_centroids.iloc[i,2], df_centroids.iloc[i,3], df_centroids.iloc[i,4], df_centroids.iloc[i,5], df_centroids.iloc[i,6]]
df_centroids[i,7]=distance.euclidean([0, 0, 0, 0, 0, 0, 0], centroid)
i+=1
df_centroids
0 1 2 3 4 5 6 'Euclidean Distance' (0, 7) (1, 7) (2, 7) (3, 7) (4, 7) (5, 7) (6, 7) (7, 7)
0 2.443664 -0.158806 -0.403137 0.609063 -0.412371 -0.486611 -0.687598 2.722099 1.556305 1.949607 1.136964 2.716432 1.988787 7.161965 6.851439
1 -0.389052 1.258986 -0.517471 -0.127748 0.379712 -0.486611 -0.143564 2.722099 1.556305 1.949607 1.136964 2.716432 1.988787 7.161965 6.851439
2 -0.215555 0.201088 1.149816 -0.501471 0.275600 -0.088475 1.434132 2.722099 1.556305 1.949607 1.136964 2.716432 1.988787 7.161965 6.851439
3 -0.227075 -0.806379 -0.412111 -0.174150 -0.417327 -0.401676 -0.234962 2.722099 1.556305 1.949607 1.136964 2.716432 1.988787 7.161965 6.851439
4 -0.130615 0.197548 1.282325 -0.940454 0.161774 2.167632 -0.263252 2.722099 1.556305 1.949607 1.136964 2.716432 1.988787 7.161965 6.851439
5 0.015202 -0.125552 -0.665733 1.792274 -0.360096 -0.390093 -0.044649 2.722099 1.556305 1.949607 1.136964 2.716432 1.988787 7.161965 6.851439
6 0.256554 1.422368 1.139299 -0.917565 6.804388 -0.486611 0.726889 2.722099 1.556305 1.949607 1.136964 2.716432 1.988787 7.161965 6.851439
7 6.010360 0.643581 2.401293 -1.193860 0.068166 1.636784 0.726889 2.722099 1.556305 1.949607 1.136964 2.716432 1.988787 7.161965 6.851439
As you see, instead of calculating Euclidean space, the code is creating 8 new columns and copying the same set of values for all rows. Where am I going wrong?
I have tried looking up online for a solution but no luck so far. Would really appreciate any help.
When working with numpy
, you usually never have to use loops. Highly tuned vector and matrix operations exist for most use cases.
For your problem, note that the Euclidean distance to the origin is the same as the Euclidean norm. There is a function in numpy.linalg
for that.
To calculate the Euclidean (l-2) norm of one vector:
import np
np.linalg.norm([1, 2, 3])
# 3.7416573867739413
To calculate the norm for a matrix of row vectors individually for each row (as in your problem):
np.linalg.norm([[1,2,3],
[4,5,6]], axis=1)
# array([3.74165739, 8.77496439])
To calculate the norm for a matrix of column vectors individually for each column:
np.linalg.norm([[1, 4],
[2, 5],
[3, 6]], axis=0)
# array([3.74165739, 8.77496439])