Relating to the question Starting question I have doubts regarding calculating coordinates of cluster centres and labeling the centres:
kmeans.cluster_centers_
gives
[[ 4.87744023 -0.48344163]
[ 8.29540909 6.7398487 ]
[ 1.05638163 3.84314976]]
I'm confused with the order of centres. The first one is 'green' cluster (label 2 in the plot), the second one is the 'red' cluster (label 0 in the plot) and last one is the 'blue' one with the label 1 in the plot. What is the logic behind it?
Also, what in case if I have labeled data for clustering as a starting point for clustering - for example Wine quality dataset WineQuality or Twitter sentiment analysis Twitter sentiment analisys. I know the labels for clusters and would like to perserve them as labels for clusters and of course to relate them to cluster centre?
The orders of clusters is usually arbitrary; there is no significance attached to them. It probably depends on the order in which the data points are processed, but doesn't really make any difference, as they're just labels.
If your data points already have labels, then simply take the n data points closest to the centre of each cluster, and assign it the most frequent label. It is unlikely that you will get a perfect clustering as in the example, as there will commonly be data points assigned to a different cluster, or in-between clusters.
The procedure would basically be: