Search code examples
rplothierarchical-clusteringdendrogram

How to line (cut) a dendrogram at the best K


How do I draw a line in a dendrogram that corresponds the best K for a given criteria?

Like this:

enter image description here

Lets suppose that this is my dendrogram, and the best K is 4.

data("mtcars")
myDend <-  as.dendrogram(hclust(dist(mtcars))) 
plot(myDend)

I know that abline function is able to draw lines in graphs similarly to the one showed above. However, I don't know how could I calculate the height, so the function is used as abline(h = myHeight)


Solution

  • The information that you need to get the heights came with hclust. It has a variable containing the heights. To get the 4 clusters, you want to draw your line between the 3rd biggest and 4th biggest height.

    HC = hclust(dist(mtcars))
    myDend <-  as.dendrogram(HC) 
    
    par(mar=c(7.5,4,2,2))
    plot(myDend)
    
    k = 4
    n = nrow(mtcars)
    MidPoint = (HC$height[n-k] + HC$height[n-k+1]) / 2
    abline(h = MidPoint, lty=2)
    

    Dendrogram