Search code examples
pythonk-meanssilhouette

How to append silhouette score to the List


I want to append silhouette score to the List in the loop.

        from sklearn.cluster import KMeans
        from sklearn.metrics import silhouette_score

        ks = range(1, 11) # for 1 to 10 clusters
        #sse = []
        sil = []

        for k in ks:
             # Create a KMeans instance with k clusters: model
             kmeans = KMeans(n_clusters = k)
             # Fit model to samples
             #kmeans.fit(X)
             cluster_labels = kmeans.fit_predict(X) #X is dataset that preprocess already.
             silhouette = silhouette_score(X, cluster_labels)


             # Append the inertia to the list of inertias
             #sse.append(kmeans.inertia_)

             #Append silhouette to the list
             sil.append(silhouette)

But, I get the following error at line 21 when I set silhouette with silhouette_score

       ValueError                   Traceback (most recent call last)
       <ipython-input-12-2570ccf62502> in <module>()
       18     #kmeans.fit(X)
       19     cluster_labels = kmeans.fit_predict(X)
   --->20     silhouette = silhouette_score(X, cluster_labels)
       21 
       22 

Solution

  • from sklearn.datasets import make_blobs
    from sklearn.cluster import KMeans
    from sklearn.metrics import silhouette_samples, silhouette_score
      
    X, y = make_blobs(n_samples=500,
                      n_features=2,
                      centers=4,
                      cluster_std=1,
                      center_box=(-10.0, 10.0),
                      shuffle=True,
                      random_state=1) 
    sil=[]
    #start the cluster range from 2
    range_n_clusters = range(2,10)
    
    for n_clusters in range_n_clusters:
        clusterer = KMeans(n_clusters=n_clusters, random_state=10)
        cluster_labels = clusterer.fit_predict(X)
        silhouette_avg = silhouette_score(X, cluster_labels)
        print("For n_clusters =", n_clusters,
              "The average silhouette_score is :", silhouette_avg)
        sil.append(silhouette_avg)

    This is an example of Kmeans clustering applied to a random sample and finding the best cluster based on the silhouette score. I think this will help you or please provide much more information enter image description here