Search code examples
rdendrogramdendextend

Labelling Vertical and Horizontal Dendrograms


I am new to R, and I am trying to construct a horizontal and vertical labelled dendrogram using dist() and hclust(). I have constructed six different types but cannot seem to add labels. Thank you if anyone has any suggestions.

I have tried many different ways to label these dendrograms without success, using as.dendrogram(), colnames(), rownames(), and label(). However, the output dendrograms have senseless labels. I am trying to label the dendrograms by "Family" - "X22", "X4", "X75", "X87". Below are the different methods which were applied, without avail.

Here is the dataframe:

  Family SBI.CV.mean
1    X22    59.25926
2     X4    57.40741
3    X75    56.19918
4    X87    59.97886


library(dendextend)
family1$Family <- as.factor(family1$Family)
class(family1$Family)
str(family1)

family2 <- ddply(family1,.(Family), summarise, 
SBI.CV.mean =    mean(SBI.CV))
family2
class(family2)

par(mfrow = c(3,3))
x_dist <- dist(x=family2$SBI.CV, method="euclidean")
x_dist
class(x_dist)

x_dist <- read.table(header=T, text=c("X22", "X4", "X75", "X87"))
x_dist2=as.matrix(x_dist2, labels=TRUE,)
colnames(x_dist) <- rownames(x_dist) <- x_dist2[["X22","X4","X75","X87"]]
x_dist2

This code produces this matrix. However, it is not labelled

          1        2        3
 2 1.851852                  
 3 3.060077 1.208225         
 4 0.719598 2.571450 3.779675

These are my attempts to add labels

require(graphics)
labs=paste(c("X22", "X4", "X75", "X87"), 1:4, sep="")
x_dist2 <- x_dist
x_dist2
colnames(x_dist2) <- labs
Dendro.data <- hclust(dist(x_dist2), "euclidean")
plot(as.dendrogram(Dendro.data), horiz=T)

require(graphics)
labs=paste(c("X22", "X4", "X75", "X87"), 1:4, sep="")
x_dist3 <- x_dist
colnames(x_dist3) <- labs
Dendro.data <- hclust(dist(x_dist3), "ave")
plot(as.dendrogram(x_dist3), hang=-1)
str(Dendro.data)

hc <- hclust(dist(family2$SBI.CV), "ave")
plot(hc)
plot(as.dendrogram(hc, hang=0.02), horiz = TRUE)

dend1 <- as.dendrogram(Dendro.data)
dend1
dend1_mod_01 <- dend1
dend1_mod_01 <- colour_branches(dend1_mod_01, k=2)
col_for_labels <- c("purple","purple","orange","purple",
"orange","dark   green")

dend_mod_01 <- color_labels(dend1_mod_01,col=col_for_labels)
plot(Dendro.data)
plot(dend1_mod_01)

Solution

  • As far as I understand, you are asking two questions, and I'll try to answer both:

    1) How do you control the names of items in a dist object?

    The easiest way is to control the rownames of the matrix/data.frame that is used to produce the dist. For example:

    > 
    > x <- data.frame(value = 6:9)
    > x
      value
    1     6
    2     7
    3     8
    4     9
    > rownames(x)
    [1] "1" "2" "3" "4"
    > # dist uses row names to indicate the relation between the items!
    > # the default is a vector of integers, as the number of items:
    > dist(x) 
      1 2 3
    2 1    
    3 2 1  
    4 3 2 1
    > 
    > rownames(x) <- letters[1:4]
    > x
      value
    a     6
    b     7
    c     8
    d     9
    > rownames(x)
    [1] "a" "b" "c" "d"
    > # dist uses row names to indicate the relation between the items!
    > # Now they are letters
    > dist(x) 
      a b c
    b 1    
    c 2 1  
    d 3 2 1
    

    2) How do you control the names of items in a dendrogram object?

    For this it is best to use the dendextend package:

    > x <- data.frame(value = 6:9)
    > x
      value
    1     6
    2     7
    3     8
    4     9
    > dist(x) 
      1 2 3
    2 1    
    3 2 1  
    4 3 2 1
    > hc <- hclust(dist(x))
    > dend <- as.dendrogram(hc)
    > plot(dend)
    > # the default labels is the names in the dist:
    > labels(dend)
    [1] 1 2 3 4
    > # Using dendextend we can update them:
    > library(dendextend)
    > labels(dend) <- letters[1:4]
    > labels(dend)
    [1] "a" "b" "c" "d"
    > plot(dend)
    

    I hope this helps.

    Tal