Search code examples
rhierarchical-clusteringhclustdendextend

Displaying hierarchical clusters at cluster level (without cases)


I am interested in visualizing the results of a hierarchical cluster analysis. Is it possible to use a dendrogram to display the names or labels of clusters (and subclusters) without displaying the original cases that went into the cluster analysis?

For example, this code applies a hierarchical cluster analysis to the mtcars dataset.

data("mtcars")
clust <- hclust(get_dist(mtcars, method = "pearson"), method = "complete")
plot(clust)

Let's say I cut the tree at 4 clusters and rename the clusters "sedan", "truck", "sportscar", and "van" (totally arbitrary labels).

clust1 <- cutree(clust,4)
clust1 <- dplyr::recode(clust1, 
                               '1'='sedan',
                               '2'='truck',
                               '3'='sportscar',
                               '4'='van')

Is it possible to display a dendrogram which shows these four labels as the nodes on the bottom of the tree, suppressing the names of the original car names?

I am also interested in displaying subclusters within clusters in a similar way, but that may be outside the scope of this question. Bonus points if you can also give a suggestion for how to display subclusters within clusters in a dendrogram while suppressing the names of the original cases! :)

Thank you in advance!


Solution

  • Yes, you can do this. I do not understand your get_dist so I will illustrate using the ordinary distance dist.

    data("mtcars")
    clust <- hclust(dist(mtcars), method = "complete")
    

    To cut off and display just the top of the tree, change it to a dendrogram and use upper. But you need to know what to height to cut it at. That is in the structure clust.

    tail(clust$height)
    [1] 113.3023 134.8119 141.7044 214.9367 261.8499 425.3447
    

    Since you want four branches, you can cut at any height between the third and fourth heights (from the end). I will use 213.

    MTC_Dend = as.dendrogram(clust)
    TreeTop = cut(MTC_Dend, h = 213)$upper
    

    You can get the basic plot now with plot(TreeTop), but it won't have the labels that you want. To change the labels, use the package dendextend which offers a tool specifically to change the labels.

    library("dendextend")
    
    labels(TreeTop) = c('sedan','truck', 'sportscar', 'van')
    plot(TreeTop)
    

    Top of the tree with new labels