How do we measure the accuracy of a K-Means clustering algorithm (say, generate a confusion matrix) since the automatic indexes of cluster is probably a permutation of the original labels?
I don't exactly know what you mean too. Your original labels perhaps is the ground truth labeling. The clustering results provided by k-means is usually an integer with range given as many as the k clusters you wish the k-means algorithm to give you.
I typically use pandas.crosstab
function to visualize the localizations of the groundtruth labeling with kmeans labeling with cross-tabulation.
For better visualization, you may want to use the following:
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(30,10))
# plot the heatmap for correlation matrix
ax = sns.heatmap(crosstab_groundtruth_kmeans.T,
square=True, annot=True, fmt='.2f')
ax.set_yticklabels(
ax.get_yticklabels(),
rotation=0);
Good luck!~