Search code examples
pythonrhierarchical-clusteringdendrogramdendextend

R/Python: Heirarchical clustering, dendrogram annotation


I want to conduct a hierarchical clustering and plotting a classic dendrogram with a heatmap. This is reasonably easy using heatmap.2 or heatmap.3 in R, and seem reasonable easy in python as well. However, what I'm not really finding a nice solution for is annotation of the tree.

Ideally, I'd like to color code my branches according to meta data. Say that I have ~ 10k rows of 5 different types, after the clustering I'd like to visualize how these types group together. It's not really feasible to label each row due to the amount of data.

It doesn't seem impossible to color the tree based on cluster/distance, but that's not really what I want.

The classifying vector for color could either be a separate column or a part of the rownames

Solutions in R och Python doesn't really matter. Thanks!

Edit:

Example:

library(gplots)
library(proxy)
df = data.frame(matrix(rnorm(100), nrow=10))
rownames(df) <- c("A_1","A_2","A_3","B_1","B_2","B_3","C_1","C_2","C_3","C_4")
df <- t(df)
distance.matrix.df <- dist(as.matrix(df), method='pearson')
clust.df1 <- hclust(distance.matrix.df, method = "average")
dend.dfc <- as.dendrogram(clust.df1)
heatmap.2(as.matrix(df), Rowv=dend.dfc, keysize=1, dendrogram="col", trace="none")

Output: Here

Desired output: Here


Solution

  • In R you could try it like this:

    library(dendextend)
    dend <- df %>% t %>% dist %>% hclust %>% as.dendrogram %>% 
      branches_attr_by_clusters(as.numeric(as.factor(substr(labels(.), 0, 1))), 
                                attr="col")
    heatmap.2(as.matrix(df), Rowv=dend.dfc, Colv=dend, keysize=1, 
              dendrogram="col", trace="none")
    

    which gives you something like this:

    enter image description here