Search code examples
scipyhierarchical-clustering

With SciPy how do I get clustering for k=? with doing hierarchical clustering


So I am using fastcluster with SciPy to do agglomerative clustering. I can do dendrogram to get the dendrogram for the clustering. I can do fcluster(Z, sqrt(D.max()), 'distance') to get a pretty good clustering for my data. What if I want to manually inspect a region in the dendrogram where say k=3 (clusters) and then I want to inspect k=6 (clusters)? How do I get the clustering at a specific level of the dendrogram?

I see all these functions with tolerances, but I don't understand how to convert from tolerance to number of clusters. I can manually build the clustering using a simple data set by going through the linkage (Z) and piecing the clusters together step by step, but this is not practical for large data sets.


Solution

  • If you want to cut the tree at a specific level, then use:

    fl = fcluster(cl,numclust,criterion='maxclust')
    

    where cl is the output of your linkage method and numclust is the number of clusters you want to get.