Search code examples
rmatrixcross-product

Opposite of cross-product: How to create a new matrix from the intersection of two matrices?


I have two tables in R (females and males) with presence-absence data. I'd like to do pairwise comparisons between them (row-by-row) to find the number of cells not shared between each pair (i.e the sum of cells equal to 1 on the female but not on the male and vice-versa).

I know that the cross product (%*%) does the opposite of what I need. It creates a new matrix containing the sum of shared cells between pairs of males and females (i.e sum um cells equal to 1 in both).

Here is an example dataset:

females <- as.data.frame(matrix(c(0,0,0,1,1,0,1,0,1,0,1,0,1,0,1,1,1,0,1,1,1,0,1,1,1), nrow=5, byrow=T))
males <-  as.data.frame(matrix(c(1,0,0,1,1,0,1,0,1,1,1,0,1,0,1,1,1,0,1,1,1,0,1,0,1), nrow=5, byrow=T))
rownames(females) <-c ("female_1","female_2","female_3","female_4","female_5")
rownames(males) <-c ("male_1","male_2","male_3","male_4","male_5")

So, if I do the cross product

as.matrix(females) %*% t(as.matrix(males))

I get this

            male_1 male_2 male_3 male_4 male_5
female_1      2      2      1      2      1
female_2      1      2      0      2      0
female_3      2      1      3      2      3
female_4      3      3      2      4      2
female_5      3      2      3      3      3

But I need this (only first row shown)

            male_1 male_2 male_3 male_4 male_5
female_1      1      1      3      2      3
.
.

In reality, my dataset is not symmetrical (I have 47 females and 32 males).

Thanks for any help!!!


Solution

  • Set up an object to receive results:

    xy <- matrix(NA, nrow(females), nrow(males))
    for ( x in 1:nrow(females) ){
            for(y in 1:nrow(males) ){ 
                  xy[x,y] <- sum(females[x, 1:ncol(females)] != males[y,1:ncol(males)])}}
    

    Should have done with nested sapply calls as well and might have been a bit cleaner since there was no need to have a separate "setup", (but only a little bit cleaner, and contrary to popular myth not any faster):

     xy <- sapply( 1:nrow(females) , 
                  function(x) sapply( 1:nrow(males) , 
                      function(y) sum( females[x, 1:ncol(females)] != males[y,1:ncol(males)]) ))
     xy
    #-----
         [,1] [,2] [,3] [,4] [,5]
    [1,]    1    3    2    1    1
    [2,]    1    1    4    1    3
    [3,]    3    5    0    3    1
    [4,]    2    2    3    0    2
    [5,]    3    5    0    3    1
    
    dimnames(xy) <- list( rownames(females), rownames(males) )