Search code examples
rhierarchical-clustering

plot y-axis with the actual value from the hierarchical clustering in R


I am trying to plot the dendogram by Complete Linkage Method in R.

I have the data set as:

x1,x2,x3,x4,x5
0,0.5,2.24,3.35,3
0.5,0,2.5,3.61,3.04
2.24,2.5,0,1.12,1.41
3.35,3.61,1.12,0,1.5
3,3.04,1.41,1.5,0

So far i have tried this code below and got the output as shown in figure:

dt <- read.csv("cluster.csv")
df<-scale(dt(-1))
dc<-dist(df,method = "euclidean")
hc1 <- hclust(dc, method = "complete" )
plot(hc1, labels = NULL, hang = 0.1, 
     main = "Cluster dendrogram", sub = NULL,
     xlab = NULL, ylab = "Height")

enter image description here

Now i wanted to

  • plot y-axis with the actual value of the value generated from the calculation of pairwise distances between clusters

  • plot x-axis with the x1,x2,x3,x4,x5

How can i plot the graph using plot as i am learning R and got obstructed here.

Edit:

As mention on the answer i have edited the labels as

labels = c("x1", "x2","x3","x4","x5")

and got the output as:

enter image description here

Now, i wanted to label the y-axis as the value calculated as the height


Solution

  • You may access the values as below

    dt <- read.csv("cluster.csv")
    df<-scale(dt[-1])  # I had to use brackets here instead of parenthesis
    dc<-dist(df,method = "euclidean")
    hc1 <- hclust(dc, method = "complete" )
    plot(hc1, labels = NULL, hang = 0.1, 
         main = "Cluster dendrogram", sub = NULL,
         xlab = NULL, ylab = "Height")
    str(hc1)
    

    Returns:

    List of 7
     $ merge      : int [1:4, 1:2] -1 -3 -5 1 -2 -4 2 3
     $ height     : num [1:4] 0.444 1.516 1.851 3.753
     $ order      : int [1:5] 1 2 5 3 4
     $ labels     : NULL
     $ method     : chr "complete"
     $ call       : language hclust(d = dc, method = "complete")
     $ dist.method: chr "euclidean"
     - attr(*, "class")= chr "hclust"
    

    As you can see, there are no vectors with five values, which is what you'd need to map directly to labels in your plot. If you know how to compute those values, just put them into a five-element vector and place that after labels =, replacing the current NULL.