I have a data frame of correlation between two variables from three different sources. So I am trying to perform k-means clustering with three centroids. I haven't included the data frame in this code so assume that I've added the two columns of data in the variable cdf. However I keep getting an error. Can you spot it?
def dis(v,w):
#Sum of square of distances from x and y to ensure positive values, then square root to find actual value
return ((w[1]-v[1])**2 + (w[0]-v[0])**2)**.5
def assign(p1,p2,p3,d):
gps={1:[],2:[],3:[]} #Three empty arrays.
for i in dt:
if dis(i,p1)<dis(i,p2) and dis(i,p1)<dis(i,p3): #If closest to first point, put in first group.
gps[1].append(i)
elif dis(i,p2)<dis(i,p1) and dis(i,p2)<dis(i,p3): #If closest to second point, put in second group.
gps[2].append(i)
else: #If closest to third point, put in third group.
gps[3].append(i)
return gps
p1=[3,3]
p2=[4,4]
p3=[5,5]
gps=assign(p1,p2,p3,cdf)
The final line of code is giving me the error.
TypeError: unsupported operand type(s) for -: 'int' and 'str'
and it's pointing to the return statement of my distance function. But I can't find the problem. Thanks in advance.
Edited to add whole traceback:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-42-d4f398e2f10f> in <module>()
----> 1 gps=assign(p1,p2,p3,cdf)
1 frames
<ipython-input-40-1132f55271a6> in dis(v, w)
2 def dis(v,w):
3 #Sum of square of distances from x and y to ensure positive values, then square root to find actual value
----> 4 return ((w[1]-v[1])**2 + (w[0]-v[0])**2)**.5
5
6 #Average array of points function
TypeError: unsupported operand type(s) for -: 'int' and 'str'
It looks like your arrays w
and v
might have string values (either one of them or both). If you have values like ["1", "2" ...]
or ["3.4", "2.1", ...]
you can do:
w1 = float(w[1])
v1 = float(v[1])
w0 = float(w[0])
v0 = float(v[0])
((w1-v1)**2 + (w0-v0)**2)**.5