How cluster number reveals known class

It is maybe a dumb question, but I can't find anything on the subject.

I have 3 classes (varieties) in my data plant and I performed a cluster analysis. I've obtained the following table when I want to compare clusters to the known classes :

cut.complete <- cutree(cluster.complete,k=3)
cc <- table(variety,cut.complete) 
cc
         cut.complete
variety    1  2  3
  AK      46 13  0
  AF       2 18 50
  GH       0 26 21

How do I know that the cluster 2 is the cluster revealing the known AF class? For example, could cluster 3 reveal AF class?

If cluster 1, cluster 2 and 3 are not revealing true varieties AK, AF and GH respectively , it means I can not use the formula

100*round(sum(diag(cc))/sum(cc), digits=3)

to calculate the percentage of correctly classified samples.

Thank you.

Solution

Actually in this case, your cluster label 3 matches with the ground truth variety AF more than it matches with GH, similarly the cluster label 2 matches with the ground truth variety GH more than it matches with AF (use the maximum matches of a cluster label with the ground truth).

As shown in the following example, the cluster label is matched with the actual (ground truth) class label, where the maximum # data points matched for each row: cluster 3 is matched with class label AK because for the variety AK maximum match in that row was found for the cluster label 3.

tab
       cut.complete
variety   1   2   3
     AF 110 125  82
     AK  93 102 130
     GH 129 103 126

library(e1071)
matchClasses(tab) # find which cluster labels match with which class labels

Cases in matched pairs: 38.4 %
AF AK GH 
 2  3  1