Search code examples
pythonk-means

K-Means Clustering - but getting an unsupported operand type(s) error


I have a data frame of correlation between two variables from three different sources. So I am trying to perform k-means clustering with three centroids. I haven't included the data frame in this code so assume that I've added the two columns of data in the variable cdf. However I keep getting an error. Can you spot it?

def dis(v,w):
  #Sum of square of distances from x and y to ensure positive values, then square root to find actual value
  return ((w[1]-v[1])**2 + (w[0]-v[0])**2)**.5

def assign(p1,p2,p3,d):
    gps={1:[],2:[],3:[]} #Three empty arrays.
    for i in dt:
      if dis(i,p1)<dis(i,p2) and dis(i,p1)<dis(i,p3): #If closest to first point, put in first group.
         gps[1].append(i)
      elif dis(i,p2)<dis(i,p1) and dis(i,p2)<dis(i,p3): #If closest to second point, put in second group.
         gps[2].append(i)
      else:  #If closest to third point, put in third group.
        gps[3].append(i)
    return gps

p1=[3,3]
p2=[4,4]
p3=[5,5]

gps=assign(p1,p2,p3,cdf)

The final line of code is giving me the error.

TypeError: unsupported operand type(s) for -: 'int' and 'str'

and it's pointing to the return statement of my distance function. But I can't find the problem. Thanks in advance.

Edited to add whole traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-42-d4f398e2f10f> in <module>()
----> 1 gps=assign(p1,p2,p3,cdf)

1 frames
<ipython-input-40-1132f55271a6> in dis(v, w)
      2 def dis(v,w):
      3   #Sum of square of distances from x and y to ensure positive values, then square root to find actual value
----> 4   return ((w[1]-v[1])**2 + (w[0]-v[0])**2)**.5
      5 
      6 #Average array of points function

TypeError: unsupported operand type(s) for -: 'int' and 'str'

Solution

  • It looks like your arrays w and v might have string values (either one of them or both). If you have values like ["1", "2" ...] or ["3.4", "2.1", ...] you can do:

    w1 = float(w[1])
    v1 = float(v[1])
    w0 = float(w[0])
    v0 = float(v[0])
    ((w1-v1)**2 + (w0-v0)**2)**.5