Search code examples
algorithmmachine-learningdata-sciencecluster-analysisk-means

How to find silhouette_score for K-means cluster Algorithm


I am trying to find silhouette_score for K-means cluster Algorithm. Actually I am using 4 other algorithms and I have to find silhouette_score of all four algorithms. I am trying to find for k-mean cluster first and use the same code for all others as well.

import pandas as pd
import numpy as np

from sklearn.datasets import load_wine
df = load_wine()

from sklearn.preprocessing import MinMaxScaler

X_scaled_data = MinMaxScaler().fit_transform(df.data)

import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3).fit(X_scaled_data)

from sklearn.metrics import silhouette_score

silhouette_avg = silhouette_score(X_scaled_data, kmeans.labels_)
print("For n_clusters =", 3, "The average silhouette_score is :", silhouette_avg)

Here is the Error:

enter image description here


Solution

  • The code example you posted works for me.

    However as the error message states, the number of unique labels (n_labels) in you predicted labels in no larger than 1. That means your algorithms assigns all points to the same cluster. If you look at the documentation for the Silhouette-score you will notice that in this case the metric is not defined:

    Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples - 1.

    Maybe consider using a different metric. Here are some examples. Or check the number of unique labels from your predictions before calculating the Silhouette-score.