Search code examples
pythonscikit-learncluster-analysisk-meansunsupervised-learning

K-means performance


I have a large dataset that each sample has a specific class number from 0 to 8. I used K-means algorithm from sklearn python package. The output of K-means is different when I run codes for several times. For example, the 246th sample belongs to cluster 3 in the first run, and it belongs to cluster 0 in the second run. I have also attached an image for your consideration.

I think it's because of random initialization for cluster centers but I need to have a constant result in several runs. How can I fix it? enter image description here


Solution

  • The number of cluster are not a real data, it’s random number to differenced cluster one from one. Then it’s not to make stable number.

    To know real type of class it’s need to associate known class and mnemonic number.

    Your sample:

    1 : 246 is #3

    2 : 246 is #0

    Your need to give name to [246]