Search code examples
rcluster-computingmanhattan

Manhattan Matrix by 2 Matrices is non symmetric but should be


I created two matrices that have random integers as components, the dimension of the matrix doesn't matter. Then I want to calculate the distance matrix by the Manhattan method and frame it as a matrix. The matrix should be symmetric, but when I frame it as a matrix, the output is a non symmetric distance matrix.

By that matrix (that should be the output) I want to calculate a cluster.

Where is my mistake?

Code:

a <- c(sample.int(30,6))

b <- c(sample.int(30,6))

c <- c(sample.int(30,6))

d <- c(sample.int(30,6))

e <- c(sample.int(30,6))

f <- c(sample.int(30,6))

V2 <- rbind(a,b,c,d,e,f)

V1 <- rbind(a,b,c,d,e,f) 

d1MNR <- matrix(dist(Vorlage1,Vorlage2, method="manhattan"))              #### Is non symmetric
          
d1MR  <- matrix(dist(V1,V2,upper=TRUE, diag=TRUE ,method="manhattan"))    #### Should be symmetric, but is not     

d1MR ### Generate output


hclust <- hclust(dist(d1MR), method = "single")                         ### Clustering

Solution

  • You can make a symmetrical distance matrix from V1 or a symmetrical matrix from V2, but the only way to make a symmetric matrix from both of them together is to combine them V12 <- rbind(V1, V2). The dist() function returns a dist object that hclus can use. You do not need to convert them to a matrix. In your example V1 and V2 are identical. We need them to be different:

    set.seed(42)
    V1 <- matrix(sample.int(30, 36, replace=TRUE), 6)
    V2 <- matrix(sample.int(30, 36, replace=TRUE), 6)
    V12 <- rbind(V1, V2)
    rownames(V12) <- paste(rep(c("V1", "V2"), each=6), 1:6, sep=":")
    colnames(V12) <- letters[1:6]
    V12
    #       a  b  c  d  e  f
    # V1:1 17 18  4 18  4 28
    # V1:2  5 26 25 15  5  8
    # V1:3  1 17  5  3 13  3
    # V1:4 25 15 14  9  5 26
    # V1:5 10 24 20 25 20  1
    # V1:6  4  7 26 27  2 10
    # V2:1 24  8 28  3 18 22
    # V2:2 30  4  5 24  6 21
    # V2:3 11  4  4 23  6  2
    # V2:4 15 22  2 17  2 23
    # V2:5 22 18 24 21 20  6
    # V2:6 26 13 18 26  3 26
    d1MNR <- dist(V12, method="manhattan")
    hclust <- hclust(d1MNR, method = "single")
    plot(hclust)
    

    If you want to look at a symmetrical distance matrix:

    print(d1MNR, upper=TRUE, diag=TRUE)
    #      V1:1 V1:2 V1:3 V1:4 V1:5 V1:6 V2:1 V2:2 V2:3 V2:4 V2:5 V2:6
    # V1:1    0   65   67   33   79   75   76   43   53   16   66   39
    # V1:2   65    0   58   66   44   38   79   90   64   57   49   72
    # V1:3   67   58    0   72   62   76   79   88   52   67   69   98
    # V1:4   33   66   72    0   86   78   45   46   74   43   63   26
    # V1:5   79   44   62   86    0   58   83   90   54   73   31   72
    # V1:6   75   38   76   78   58    0   75   68   48   73   59   54
    # V2:1   76   79   79   45   83   75    0   67   93   80   52   59
    # V2:2   43   90   88   46   90   68   67    0   40   49   73   36
    # V2:3   53   64   52   74   54   48   93   40    0   55   65   68
    # V2:4   16   57   67   43   73   73   80   49   55    0   72   49
    # V2:5   66   49   69   63   31   59   52   73   65   72    0   57
    # V2:6   39   72   98   26   72   54   59   36   68   49   57    0
    

    Dendrogram