Search code examples
pythoncluster-analysismetricsunsupervised-learning

How to interprete the Python Clustering scores?


I try to use Agglomerative Clustering to cluster some Data but i dont know which number of clusters are perfect. Here are my results:Graph shows lot of Measuring Values in percentage on y axis and the number of Clusters on x axis

The Database consists out of 65 Classes to be recognize. Gini Value=0.265.

  1. What should be choosen for number of clusters? Maybe the same as number of classes?
  2. What means the intersection point of completeness and homogeneity and v measure?
  3. What means the maximum in adjusted mutual info score?

Solution

    1. Don't use these measures for choosing k. Because they compare to the known solution. If you have a known solution, why choose an approximation instead?

    2. Probably just a coincidence. But you may want to study the equations, maybe they do agree at this point.

    3. For AMI, NMI, ARI, etc. the maximum is the k with the largest agreement with your existing labeled solution.