Search code examples
python-3.xscikit-learnk-means

k-means clustering mapping


I am using K-Means clustering of sklearn and want to replace the computed K-Means cluster labels with the centroid values using trained K-Means model.

The code I am using is as follows:

# Initialize K-Means clustering model-
kmeans_conv1 = KMeans(n_clusters = 5)

# Train model on training data (compute k-means clustering)-
kmeans_conv1.fit(conv1_nonzero.reshape(-1, 1))

# number of clusters used-
kmeans_conv1.n_clusters
# 5

# Get centroids-
kmeans_conv1.cluster_centers_
'''
array([[-0.05669265],
       [ 0.06742188],
       [-0.08835593],
       [ 0.03749201],
       [ 0.0896403 ]], dtype=float32)
'''


# Clustered labels of each data point-
kmeans_conv1.labels_

set(kmeans_conv1.labels_)                                             
Out[142]: {0, 1, 2, 3, 4}

# Get clustered label for each data point-
clustered_labels = kmeans_conv1.labels_

Currently, I am using if-else conditions to map the labels to the centroid values as:

new_clusters = []


for clabel in clustered_labels:
    if clabel == 0:
        new_clusters.append(kmeans_conv1.cluster_centers_[0][0])
    elif clabel == 1:
        new_clusters.append(kmeans_conv1.cluster_centers_[1][0])
    elif clabel == 2:
        new_clusters.append(kmeans_conv1.cluster_centers_[2][0])
    elif clabel == 3:
        new_clusters.append(kmeans_conv1.cluster_centers_[3][0])
    elif clabel == 4:
        new_clusters.append(kmeans_conv1.cluster_centers_[4][0])

At the end, I want 'new_clusters' list or np.array variable to contain the centroid values instead of the cluster labels.

However, is there a better way to achieve this without using if-else conditions?


Solution

  • This is sufficient:

    for clabel in clustered_labels:
        new_clusters.append(
            kmeans_conv1.cluster_centers_[clabel][0]
        )