I want to conduct a hierarchical clustering and plotting a classic dendrogram with a heatmap. This is reasonably easy using heatmap.2 or heatmap.3 in R, and seem reasonable easy in python as well. However, what I'm not really finding a nice solution for is annotation of the tree.
Ideally, I'd like to color code my branches according to meta data. Say that I have ~ 10k rows of 5 different types, after the clustering I'd like to visualize how these types group together. It's not really feasible to label each row due to the amount of data.
It doesn't seem impossible to color the tree based on cluster/distance, but that's not really what I want.
The classifying vector for color could either be a separate column or a part of the rownames
Solutions in R och Python doesn't really matter. Thanks!
Edit:
Example:
library(gplots)
library(proxy)
df = data.frame(matrix(rnorm(100), nrow=10))
rownames(df) <- c("A_1","A_2","A_3","B_1","B_2","B_3","C_1","C_2","C_3","C_4")
df <- t(df)
distance.matrix.df <- dist(as.matrix(df), method='pearson')
clust.df1 <- hclust(distance.matrix.df, method = "average")
dend.dfc <- as.dendrogram(clust.df1)
heatmap.2(as.matrix(df), Rowv=dend.dfc, keysize=1, dendrogram="col", trace="none")
Output: Here
Desired output: Here
In R you could try it like this:
library(dendextend)
dend <- df %>% t %>% dist %>% hclust %>% as.dendrogram %>%
branches_attr_by_clusters(as.numeric(as.factor(substr(labels(.), 0, 1))),
attr="col")
heatmap.2(as.matrix(df), Rowv=dend.dfc, Colv=dend, keysize=1,
dendrogram="col", trace="none")
which gives you something like this: