Search code examples
pythondataframemachine-learningcluster-analysisk-means

How to visually compare clusters using python?


I am working on k-means clustering for customer segmentation. My input data has 12 features and 7315 rows.

therefore, I tried the below code to execute the k-means

kmeans = KMeans(n_clusters = 5, init = "k-means++", random_state = 42)
data_normalized['y_kmeans'] = kmeans.fit_predict(data_normalized)

For visualizing, I tried the below code

u_labels = np.unique(data_normalized['y_kmeans'])
 
#plotting the results:
 
for i in u_labels:
    plt.scatter(data_normalized[y_kmeans == i , 0] , data_normalized[y_kmeans == i , 1] , label = i)
plt.legend()
plt.show()

I got an error as below

TypeError: '(array([False, False, False, ..., False, False, False]), 0)' is an invalid key

InvalidIndexError: (array([False, False, False, ..., False, False, False]), 0)

How can I visualize my clusters to see how far they are from each other?


Solution

  • Since I do not have your dataset, I simulated your dataframe as follows: (I have assumed 9 different cluster groups)

    d={'col1': [i/100 for i in random.choices(range(1,100), k=7315)],
           'col2':[i/100 for i in random.choices(range(1,100), k=7315)],
           'y_kmeans':random.choices(range(1,10), k=7315)}
    data_normalized = pd.DataFrame(d)
    

    After that you can plot the clusters as follows ,

    import numpy as np
    import random
    import pandas as pd
    import matplotlib.pyplot as plt
    
    u_labels = np.unique(data_normalized['y_kmeans']).tolist()
    
    scatter = plt.scatter(data_normalized['col1'], data_normalized['col2'],
                c=data_normalized['y_kmeans'], cmap='tab20')
    plt.legend(handles=scatter.legend_elements()[0], labels=u_labels)
    plt.show()
    

    I get the following clusters plotclusters