Search code examples
rhierarchical-clusteringdendrogramunsupervised-learningdendextend

How to find the number of clusters when cutting a tree at a certain height in R


I want to find the number of clusters when cutting a tree at given heights.

The tree is of class "dendrogram" in R and so I have been using the package dendextend to explore this.

Example:

# Create a dend:
dend <- 1:5 %>% dist %>% hclust %>% as.dendrogram
# Plot it:
dend %>% plot

I want to find how many clusters there are when I specify, for example, "height = 3" (see y-axis in the generated plot).

At height 3 I should get the answer "2" because at that height a horizontal line should hit two vertical lines, and hence two clusters are generated.

At "height = 1.5" the answer should be "3" because three lines are crossed etc..

I am using object of class dendrogram because my raw data is in Newick format and I have only found the read.dendrogram() function to parse this tree. I have used as.hclust() to convert this to hclust class but I still can't find an answer.

Also, if anyone knows how to plot the clusters generated by specifying height, that would help.


Solution

  • You want to use cutree from dendextend

    library(dendextend)
    dend <- 1:5 %>% dist %>% hclust %>% as.dendrogram
    length(unique(cutree(dend, h = 1.5)))