Search code examples
rcluster-analysishierarchical-clusteringdendrogramdendextend

Color branches of dendrogram using an existing column


I have a data frame which I am trying to cluster. I am using hclust right now. In my data frame, there is a FLAG column which I would like to color the dendrogram by. By the resulting picture, I am trying to figure out similarities among various FLAG categories. My data frame looks something like this:

FLAG    ColA    ColB    ColC    ColD

I am clustering on colA, colB, colC and colD. I would like to cluster these and color them according to FLAG categories. Ex - color red if 1, blue if 0 (I have only two categories). Right now I am using the vanilla version of cluster plotting.

hc<-hclust(dist(data[2:5]),method='complete')
plot(hc)

Any help in this regard would be highly appreciated.


Solution

  • If you want to color the branches of a dendrogram based on a certain variable then the following code (largely taken from the help for the dendrapply function) should give the desired result:

    x<-1:100
    dim(x)<-c(10,10)
    groups<-sample(c("red","blue"), 10, replace=TRUE)
    
    x.clust<-as.dendrogram(hclust(dist(x)))
    
    local({
      colLab <<- function(n) {
        if(is.leaf(n)) {
          a <- attributes(n)
          i <<- i+1
          attr(n, "edgePar") <-
            c(a$nodePar, list(col = mycols[i], lab.font= i%%3))
        }
        n
      }
      mycols <- groups
      i <- 0
    })
    
    x.clust.dend <- dendrapply(x.clust, colLab)
    plot(x.clust.dend)