Search code examples
rcolorsdataframedendrogramdendextend

Coloring dendrogram’s end branches (or leaves) based on column number of data frame in R


From a dataframe data.main, I am able to generate a hclust dendrogram as,

aa1<- c(2,4,6,8)
bb1<- c(1,3,7,11)
aa2<-c(3,6,9,12)
bb2<-c(3,5,7,9)
data.main<- data.frame(aa1,bb1,aa2,bb2)
d1<-dist(t(data.main))
hcl1<- hclust(d1)
plot(hcl1)

Further, I know there are ways to use a tree cutoff to color the branches or leaves. However, is it possible to color them based on partial column names or column number (e.g. I want that branch corresponding to aa1, aa2 be red and bb1 and bb2 be blue)?

I have checked the R package dendextend but am still not able to find a direct/easy way to get the desired result.

dendrogram with <code>aa2</code> and <code>bb2</code> clustered most closely. Then <code>bb1</code> is next closest, followed by <code>aa1</code>. The labels and branches are colored based on the label. Those starting with "aa" are red and those starting with "bb" are blue.


Solution

  • It's easier to change colors for a dendrogram than an hclust object, but it's pretty straightforward to convert. You can do

    drg1 <- dendrapply(as.dendrogram(hcl1, hang=.1), function(n){
      if(is.leaf(n)){
        labelCol <- c(a="red", b="blue")[substr(attr(n,"label"),1,1)];
        attr(n, "nodePar") <- list(pch = NA, lab.col = labelCol);
        attr(n, "edgePar") <- list(col = labelCol); # to color branch as well
      }
      n;
    });
    plot(drg1)
    

    which will draw

    enter image description here