Currently, I'm writing a simple Python program for doing the k-medians clustering, however I encountered a problem which I thought related to the variable scoping.
Here is my clustering method
class Cluster(object):
center = None
points = []
def __init__(self, center):
super(Cluster, self).__init__()
self.center = center
def manhattan(row_a, row_b):
dimensions = len(row_a)
manhattan_dist = 0
for i in range(0, dimensions):
manhattan_dist = manhattan_dist + np.abs(float(row_a[i]) - float(row_b[i]))
return manhattan_dist
def cluster(dataset, cluster_centers):
clusters = []
for cluster_center in cluster_centers:
clusters.append(Cluster(center = cluster_center))
for point in dataset:
last_dist = np.inf
last_cluster = None
for cluster in clusters:
dist = manhattan(point, cluster.center)
if(dist != 0):
if (dist < last_dist):
print str(dist) + " " + str(last_dist)
last_dist = dist
last_cluster = cluster
last_cluster.points.append(point)
return clusters
result = cluster([[1,1], [1,2], [1,3], [7,2], [8,3], [7,1]], [[2,2], [6,6]])
--
result = cluster([[1,1], [1,2], [1,3], [7,2], [8,3], [7,1]], [[2,2], [6,6]])
and here is the output that I got
The problem is that, I had an issue assigning the value to variable "last_dist" and possibly "last_cluster" inside the clusters for-loop, the value hadn't seem to be updated at all according to what can be seen printed in the output, except for that one single iteration that it has a value of 7 before going back to be its original value "Inf" again. What is the root cause of this and what can I do with it ? Thank you
What else do you expect to happen? Here is your code:
for point in dataset:
last_dist = np.inf # this line is executed 6 times
last_cluster = None
for cluster in clusters:
...
You only have 2 items in clusters
, and 6 in dataset
. Therefore, for each point (6 times), last_dist
starts as inf
. You have 6 inf
s in your output, so that is working as expected. For the second cluster, last_dist
is only printed if it meets your condition if (dist < last_dist)
. It looks like it does this exactly once, which is why you get 7.0
instead of inf
. Perhaps you have a bug in manhattan()
?
Because