I have a data frame which I am trying to cluster. I am using hclust
right now. In my data frame, there is a FLAG
column which I would like to color the dendrogram by. By the resulting picture, I am trying to figure out similarities among various FLAG
categories. My data frame looks something like this:
FLAG ColA ColB ColC ColD
I am clustering on colA
, colB
, colC
and colD
. I would like to cluster these and color them according to FLAG
categories. Ex - color red if 1, blue if 0 (I have only two categories). Right now I am using the vanilla version of cluster plotting.
hc<-hclust(dist(data[2:5]),method='complete')
plot(hc)
Any help in this regard would be highly appreciated.
If you want to color the branches of a dendrogram based on a certain variable then the following code (largely taken from the help for the dendrapply function) should give the desired result:
x<-1:100
dim(x)<-c(10,10)
groups<-sample(c("red","blue"), 10, replace=TRUE)
x.clust<-as.dendrogram(hclust(dist(x)))
local({
colLab <<- function(n) {
if(is.leaf(n)) {
a <- attributes(n)
i <<- i+1
attr(n, "edgePar") <-
c(a$nodePar, list(col = mycols[i], lab.font= i%%3))
}
n
}
mycols <- groups
i <- 0
})
x.clust.dend <- dendrapply(x.clust, colLab)
plot(x.clust.dend)