Search code examples
pythoncountk-means

Python count occurrences of labels in Kmeans


I'm trying to compare a list of labels from Sklears Kmeans, with the predicted labels for another dataset. But the label list are different sizes, so i want the occurrence of each label.

So I have already tried using Counter, but I'm not getting exactly what I want. At the moment im using np.unique and there are still some problems.

As an Example:

X = np.array([[1, 2], [1, 4], [1, 0],[4, 2], [4, 4], [4, 0]])

kmeans = KMeans(n_clusters=4, random_state=0).fit(X)

Unique,count = np.unique(kmeans.labels_,return_index=True)

print(count) # [2 2 1 1] so far so good

New_Labels = kmeans.predict([[0, 4], [4, 4],[0,5],[1,6],[7,2],[4,0],[4,2]])

print(New_Labels) # [3 0 3 3 0 2 0] also good

Unique1,count1 = np.unique(Labels,return_index=True)

Then here is where I have the problem.

print(Unique1,count1) #[3 1 3]

I would like the output of my count of the labels to also say 0 if the label of a cluster is not there. So I would like the count of my predicted labels to be

[3 0 1 3]

Solution

  • You can use the following list comprehension, which goes through all possible cluster assignments and .count the occurrences of each element:

    [l.count(i) for i in range(max(l)+1)]
    [3, 0, 1, 3]