Search code examples
rclassificationhierarchical-clusteringdendrogramdendextend

How to color branches in R dendogram as a function of the classes in it?


I wish to visualize how well a clustering algorithm is doing (with certain distance metric). I have samples and their corresponding classes. To visualize, I cluster and I wish to color the branches of a dendrogram by the items in the cluster. The color will be the color most items in the hierarchical cluster correspond to (given by the data\classes).

Example: If my clustering algorithm chose indexes 1,21,24 to be a certain cluster (at a certain level) and I have a csv file containing a class number in each row corresponding to lets say 1,2,1. I want this edge to be coloured 1.

Example Code:

require(cluster)
suppressPackageStartupMessages(library(dendextend))
dir <- 'distance_metrics/'
filename <- 'aligned.csv'
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
my.dist <- as.dist(my.data)
real.clusters <-read.csv("clusters", header = T, row.names = 1)
clustered <- diana(my.dist)
# dend <- colour_branches(???dend, max(real.clusters)???)
plot(dend)

EDIT: another example partial code

dir <- 'distance_metrics/' # csv in here contains a symmetric matrix
clust.dir <- "clusters/" #csv in here contains a column vector with classes
my.data <- read.csv(paste(dir, filename, sep=""), header = T, row.names = 1)
filename <- 'table.csv'
my.dist <- as.dist(my.data)
real.clusters <-read.csv(paste(clust.dir, filename, sep=""), header = T, row.names = 1)
clustered <- diana(my.dist)
dnd <- as.dendrogram(clustered)

Solution

  • Both node and edge color attributes can be set recursively on "dendrogram" objects (which are just deeply nested lists) using dendrapply. The cluster package also features an as.dendrogram method for "diana" class objects, so conversion between the object types is seamless. Using your diana clustering and borrowing some code from @Edvardoss iris example, you can create the colored dendrogram as follows:

    library(cluster)
    set.seed(999)
    iris2 <- iris[sample(x = 1:150,size = 50,replace = F),]
    clust <- diana(iris2)
    dnd <- as.dendrogram(clust)
    
    ## Duplicate rownames aren't allowed, so we need to set the "labels"
    ## attributes recursively. We also label inner nodes here. 
    rectify_labels <- function(node, df){
      newlab <- df$Species[unlist(node, use.names = FALSE)]
      attr(node, "label") <- (newlab)
      return(node)
    }
    dnd <- dendrapply(dnd, rectify_labels, df = iris2)
    
    ## Create a color palette as a data.frame with one row for each spp
    uniqspp <- as.character(unique(iris$Species))
    colormap <- data.frame(Species = uniqspp, color = rainbow(n = length(uniqspp)))
    colormap[, 2] <- c("red", "blue", "green")
    colormap
    
    ## Now color the inner dendrogram edges
    color_dendro <- function(node, colormap){
      if(is.leaf(node)){
        nodecol <- colormap$color[match(attr(node, "label"), colormap$Species)]
        attr(node, "nodePar") <- list(pch = NA, lab.col = nodecol)
        attr(node, "edgePar") <- list(col = nodecol)
      }else{
        spp <- attr(node, "label")
        dominantspp <- levels(spp)[which.max(tabulate(spp))]
        edgecol <- colormap$color[match(dominantspp, colormap$Species)]
        attr(node, "edgePar") <- list(col = edgecol)
      }
      return(node)
    }
    dnd <- dendrapply(dnd, color_dendro, colormap = colormap)
    
    ## Plot the dendrogram
    plot(dnd)
    

    enter image description here