So I am using fastcluster with SciPy to do agglomerative clustering. I can do dendrogram
to get the dendrogram for the clustering. I can do fcluster(Z, sqrt(D.max()), 'distance')
to get a pretty good clustering for my data. What if I want to manually inspect a region in the dendrogram where say k=3 (clusters) and then I want to inspect k=6 (clusters)? How do I get the clustering at a specific level of the dendrogram?
I see all these functions with tolerances, but I don't understand how to convert from tolerance to number of clusters. I can manually build the clustering using a simple data set by going through the linkage (Z) and piecing the clusters together step by step, but this is not practical for large data sets.
If you want to cut the tree at a specific level, then use:
fl = fcluster(cl,numclust,criterion='maxclust')
where cl
is the output of your linkage method and numclust
is the number of clusters you want to get.