in Matlab the kmeans function can give sumd, which is the within-cluster sums of point-to-centroid distances in the k-by-1 vector.
[idx,C,sumd] = kmeans(___)
i need to do this in python.
I have found that km.transform returns an array of distances form cluster
array([[0.13894406, 2.90411146],
[3.25560603, 0.21255051],
[2.43748321, 0.60557231],
[1.16330349, 4.20635901],
[0.53391368, 2.50914184],
[3.43498204, 0.39192652]])
if i do km.predict i get the identity of the clusters
array([0, 1, 1, 0, 0, 1], dtype=int32)
I'm struggling to figure out how i can calculate the mean distance for each cluster.
any suggestions would be appreciated
You can use np.bincount
:
dists = np.array([[0.13894406, 2.90411146],
[3.25560603, 0.21255051],
[2.43748321, 0.60557231],
[1.16330349, 4.20635901],
[0.53391368, 2.50914184],
[3.43498204, 0.39192652]])
ids = np.array([0, 1, 1, 0, 0, 1], dtype=np.int32)
np.bincount(ids, dists[np.arange(len(dists)), ids]) / np.bincount(ids)
# array([0.61205374, 0.40334978])