Search code examples
pythonscikit-learncluster-analysis

sklearn spectral clustering results in smaller number of cluster than set


from sklearn.cluster import SpectralClustering
import numpy as np
test = np.array([[63.15907836],
       [69.67386298],
       [67.20030411],
       [66.25165771],
       [62.21031327],
       [55.09531565],
       [65.85034014],
       [52.99841912],
       [52.04523986],
       [52.09008007],
       [94.65364516]])
clustering = SpectralClustering(n_clusters = 4).fit(test)
clustering.labels_

The upper code results in array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1], dtype=int32) which got me astonished. Spectral clustering requires to set a number of clusters, which I did, but get only two clusters. What am I missing?


Solution

  • Sometimes, depending on the initialization, spectral clustering (and k-means) can find empty clusters.

    For instance, setting random_state to 17 leads to 4 clusters:

    clustering = SpectralClustering(n_clusters = 4, random_state=17).fit(test)
    

    You can find an illustration of it in for k-means (spectral clustering rely on k-means): http://user.ceng.metu.edu.tr/~tcan/ceng465_f1314/Schedule/KMeansEmpty.html