Search code examples
rmatrix

How to compare several matrices and calculate the percentage difference in R


Suppose that's my data example

nrow<-4
ncol<-5
m1 <- matrix(rbinom(nrow*ncol,1,.5),nrow,ncol)
m2 <- matrix(rbinom(nrow*ncol,1,.5),nrow,ncol)
m3 <- matrix(rbinom(nrow*ncol,1,.5),nrow,ncol)

I need to compare 3 matrices sequentially. According to this principle. Eg

m1

[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 0 1
[2,] 1 1 1 0 1
[3,] 1 0 0 1 1
[4,] 0 0 0 1 0

m2

[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 1
[2,] 0 1 1 0 0
[3,] 1 0 0 0 0
[4,] 0 1 0 1 0

. Now count the number of matches in each column of the matrix. Take the first column of both matrices

    m1 m2

    1 1 matched values
    1 0 values did not match
    1 1 matched values
    0 0 matched values

in total, out of 4 values in the first columns of matrices m1 and m2, 3 coincided. It turns out that 75% of the values coincided.

then take the second column

 m1   m2
[,2] [,2] 
0     0 matched values
1     1 matched values
0     0 values did not match
0     1 values did not match

here is also a similar situation where 3 values coincided

in other words, as the desired output there must be something like

[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 0 1
[2,] 1 1 1 0 1
[3,] 1 0 0 1 1
[4,] 0 0 0 1 0


    [,1] [,2] [,3] [,4] [,5]
[1,] 1 0 0 0 1
[2,] 0 1 1 0 0
[3,] 1 0 0 0 0
[4,] 0 1 0 1 0
    75 75 75 75 50 

Now let's calculate the average value of the percentages received. = 70%

(75+75+75+75+50)/5=70

What is the simplest way to calculate the percentage difference between all matrices? first between m1 and m2, then between m1 and m3, and lastly between m2 and m3

Thank you for your help


Solution

  • Here are two base R options

    • combn + mean

    combn generates combination pairs, and mean tells the mean of co-occurrences.

    combn(
        lst,
        2,
        \(x) mean(do.call(`==`,x))
    )
    

    you will obtain

    [1] 0.5 0.4 0.5
    
    • adist + toString

    This approach generates a matrix that depicts the co-occurrences.

    > 1 - adist(unlist(lapply(lst, toString))) / lengths(lst)
         [,1] [,2] [,3]
    [1,]  1.0  0.5  0.4
    [2,]  0.5  1.0  0.5
    [3,]  0.4  0.5  1.0
    

    Data

    set.seed(0)
    nrow <- 4
    ncol <- 5
    m1 <- matrix(rbinom(nrow * ncol, 1, .5), nrow, ncol)
    m2 <- matrix(rbinom(nrow * ncol, 1, .5), nrow, ncol)
    m3 <- matrix(rbinom(nrow * ncol, 1, .5), nrow, ncol)
    
    lst <- list(m1, m2, m3)