I have calculated the Distance matrix with the complete link method as shown in the image below:
The pairwise distance betwwen the clusters are
{0.5,1.12,1.5,3.61}
But While implementing with the same matrix in R with the code below:
Matrix
x1,x2,x3,x4,x5
0,0.5,2.24,3.35,3
0.5,0,2.5,3.61,3.04
2.24,2.5,0,1.12,1.41
3.35,3.61,1.12,0,1.5
3,3.04,1.41,1.5,0
Implementation:
library(cluster)
dt<-read.csv("cluster.csv")
df<-scale(dt[-1])
dc<-dist(df,method = "euclidean")
hc1 <- hclust(dc, method = "complete" )
plot(hc1, labels = c("x1", "x2","x3","x4","x5"),
hang = 0.1,
main = "Cluster dendrogram", sub = NULL,
xlab = NULL, ylab = "Height")
abline(h = hc1$height, lty = 2, col = "lightgrey")
str(hc1)
str(hc1)
List of 7
$ merge : int [1:4, 1:2] -1 -3 -5 1 -2 -4 2 3
$ height : num [1:4] 0.444 1.516 1.851 3.753
$ order : int [1:5] 1 2 5 3 4
$ labels : NULL
$ method : chr "complete"
$ call : language hclust(d = dc, method = "complete")
$ dist.method: chr "euclidean"
- attr(*, "class")= chr "hclust"
I have got the height as: 0.444 1.516 1.851 3.753
Which means the dendogram will be different in both cases, why is that different in both cases? May be i have done something wrong on the implementing on both ways?
Since the provided matrix is the euclidean distance matrix, so i don't need to calculate the distance matrix: rather i should convert the data.frame
to dist.matrix
. and to as.dist(m)
.
The below code will give me the exact result which was obtained from the paper calculation:
library(reshape)
dt<-read.csv("C:/Users/Aakash/Desktop/cluster.csv")
m <- as.matrix(dt)
hc1 <- hclust(as.dist(m), method = "complete" )
plot(hc1, labels = c("x1", "x2","x3","x4","x5"),
hang = 0.1,
main = "Complete Method Dendogram", sub = NULL,
xlab = "Items", ylab = "Height")
abline(h = hc1$height, lty = 2, col = "lightgrey")
str(hc1)
height : num [1:4] 0.5 1.12 1.5 3.61
Obtained Dendogram: