Search code examples
pythonscikit-learnk-means

How to get SSE for each cluster in k means?


I am using the sklearn.cluster KMeans package and trying to get SSE for each cluster. I understand kmeans.inertia_ will give the sum of SSEs for all clusters. Is there any way to get SSE for each cluster in sklearn.cluster KMeans package?

I have a dataset which has 7 attributes and 210 observations. The number of cluster is 3 and would like to compute SSE for each cluster.


Solution

  • There is no direct way to do this using a KMeans object. However, you can easily compute the sum of squared distances for each cluster yourself.

    import numpy as np
    
    # ...
    
    kmeans = KMeans(n_clusters=3).fit(X)
    
    cluster_centers = [X[kmeans.labels_ == i].mean(axis=0) for i in range(3)]
    
    clusterwise_sse = [0, 0, 0]
    for point, label in zip(X, kmeans.labels_):
        clusterwise_sse[label] += np.square(point - cluster_centers[label]).sum()
    

    This snippet is not the most efficient way to do this since my goal was to present the concept clearly.