Search code examples
rcluster-analysishierarchical-clustering

Hierarchical cluster analysis help - dendrogram


I made a code to generate a dendrogram as you can see in the image, using the hclust function. So, I would like help in the interpretation of this dendrogram. Note that the locations of these points are close. What does this dendrogram result I'm having mean, can you help me? I would really like a more complete analysis of the generated output.

library(geosphere)

Points_properties<-structure(list(Propertie=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29), Latitude = c(-24.781624, -24.775017, -24.769196, 
                                               -24.761741, -24.752019, -24.748008, -24.737312, -24.744718, -24.751996, 
                                               -24.724589, -24.8004, -24.796899, -24.795041, -24.780501, -24.763376, 
                                               -24.801715, -24.728005, -24.737845, -24.743485, -24.742601, -24.766422, 
                                               -24.767525, -24.775631, -24.792703, -24.790994, -24.787275, -24.795902, 
                                               -24.785587, -24.787558), Longitude = c(-49.937369, 
                                                                                                  -49.950576, -49.927608, -49.92762, -49.920608, -49.927707, -49.922095, 
                                                                                                  -49.915438, -49.910843, -49.899478, -49.901775, -49.89364, -49.925657, 
                                                                                                  -49.893193, -49.94081, -49.911967, -49.893358, -49.903904, -49.906435, 
                                                                                                  -49.927951, -49.939603, -49.941541, -49.94455, -49.929797, -49.92141, 
                                                                                                  -49.915141, -49.91042, -49.904772, -49.894034)), row.names = c(NA, -29L), class = c("tbl_df", "tbl", 
                                                                                                                                                                                                                        "data.frame"))

coordinates<-subset(Points_properties,select=c("Latitude","Longitude"))
plot(coordinates[,2:1])
text(x = Points_properties$Longitude,
y= Points_properties$Latitude, labels=Points_properties$Propertie, pos=2)

enter image description here

d<-distm(coordinates[,2:1])
d<-as.dist(d)
fit.average<-hclust(d,method="average")
plot(fit.average,hang=-1,cex=.8, main = "")

enter image description here


Solution

  • You chose to perform hierarchical clustering using average method.

    According to ?hclust:

    This function performs a hierarchical cluster analysis using a set of dissimilarities for the n objects being clustered. Initially, each object is assigned to its own cluster and then the algorithm proceeds iteratively, at each stage joining the two most similar clusters, continuing until there is just a single cluster. At each stage distances between clusters are recomputed

    You can follow what happens using the merge field:

    Row i of merge describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation −j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm

    fit.average$merge
          [,1] [,2]
     [1,]  -21  -22
     [2,]  -15    1
     [3,]  -13  -24
     [4,]   -6  -20
     [5,]   -2  -23
     [6,]  -16  -27
    ...
    

    This is what you see in the dendogram:
    enter image description here

    The height on the y-axis of the dendogram represents the distance between a point and the center of the cluster it's associated to (because you use method average).

    1. points 21 and 22 (which are the nearest) are merged together creating cluster 1 with their barycenter
    2. cluster 1 is merged with point 15 creating cluster 2
    3. ...

    You could then call rect.clust which allows various arguments, like the number of groups k you'd like:

    rect.hclust(fit.average, k=3)
    

    enter image description here

    You can also use output of rect.clust to color the original points:

    groups <- rect.hclust(fit.average, k=3)
    groups
    
    #[[1]]
    # [1]  5  6  7  8  9 10 17 18 19 20
    
    #[[2]]
    # [1]  1  2  3  4 15 21 22 23
    
    #[[3]]
    #  [1] 11 12 13 14 16 24 25 26 27 28 29
    
    colors <- rep(1:length(groups),lengths(groups))
    colors <- colors[order(unlist(groups))]
    
    plot(coordinates[,2:1],col = colors)
    

    enter image description here