I am doing the kmean clustering through sklearn in python. I am wondering how to change the generated label name for kmean clusters. For example:
data Cluster
0.2344 1
1.4537 2
2.4428 2
5.7757 3
And I want to achieve to
data Cluster
0.2344 black
1.4537 red
2.4428 red
5.7757 blue
I am not meaning to directly set1 -> black; 2 -> red
by printing. I am wondering is it possible to set different cluster names in kmean clustering model in default.
No
There isn't any way to change the default labels.
You have to map them separately using a dictionary.
You can take look at all available methods in the documentation here.
None of the available methods or attributes allows you to change the default labels.
Solution using dictionary:
# Code
a = [0,0,1,1,2,2]
mapping = {0:'black', 1:'red', 2:'blue'}
a = [mapping[i] for i in a]
# Output
['black', 'black', 'red', 'red', 'blue', 'blue']
If you change your data or number of clusters:
First we will see the visualizations:
Code:
Importing and generating random data:
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
x = np.random.uniform(100, size =(10,2))
Applying Kmeans algorithm
kmeans = KMeans(n_clusters=3, random_state=0).fit(x)
Getting cluster centers
arr = kmeans.cluster_centers_
Your cluster centroids look like this:
array([[23.81072765, 77.21281171],
[ 8.6140551 , 23.15597377],
[93.37177176, 32.21581703]])
Here, 1st row is the centroid of cluster 0, 2nd row is centroid of cluster 1 and so on.
Visualizing centroids and data:
plt.scatter(x[:,0],x[:,1])
plt.scatter(arr[:,0], arr[:,1])
You get a graph that looks like this:
.
As you can see, you have access to centroids as well as training data. If your training data and number of clusters is constant these centroids dont really change.
But if you add more training data or more number of clusters then you will have to create new mapping according to the centroids that are generated.