python scikit-learn cluster-analysis k-means multilabel-classification

How to evaluate K-Means Clustering since automatic indexes of clusters don't match true labels?

How do we measure the accuracy of a K-Means clustering algorithm (say, generate a confusion matrix) since the automatic indexes of cluster is probably a permutation of the original labels?

Solution

I don't exactly know what you mean too. Your original labels perhaps is the ground truth labeling. The clustering results provided by k-means is usually an integer with range given as many as the k clusters you wish the k-means algorithm to give you.

I typically use pandas.crosstab function to visualize the localizations of the groundtruth labeling with kmeans labeling with cross-tabulation.

For better visualization, you may want to use the following:

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(30,10))

# plot the heatmap for correlation matrix
ax = sns.heatmap(crosstab_groundtruth_kmeans.T, 
                square=True, annot=True, fmt='.2f')

ax.set_yticklabels(
    ax.get_yticklabels(),
    rotation=0);

out:

Good luck!~