I have two tables in R (females and males) with presence-absence data. I'd like to do pairwise comparisons between them (row-by-row) to find the number of cells not shared between each pair (i.e the sum of cells equal to 1 on the female but not on the male and vice-versa).
I know that the cross product (%*%) does the opposite of what I need. It creates a new matrix containing the sum of shared cells between pairs of males and females (i.e sum um cells equal to 1 in both).
Here is an example dataset:
females <- as.data.frame(matrix(c(0,0,0,1,1,0,1,0,1,0,1,0,1,0,1,1,1,0,1,1,1,0,1,1,1), nrow=5, byrow=T))
males <- as.data.frame(matrix(c(1,0,0,1,1,0,1,0,1,1,1,0,1,0,1,1,1,0,1,1,1,0,1,0,1), nrow=5, byrow=T))
rownames(females) <-c ("female_1","female_2","female_3","female_4","female_5")
rownames(males) <-c ("male_1","male_2","male_3","male_4","male_5")
So, if I do the cross product
as.matrix(females) %*% t(as.matrix(males))
I get this
male_1 male_2 male_3 male_4 male_5
female_1 2 2 1 2 1
female_2 1 2 0 2 0
female_3 2 1 3 2 3
female_4 3 3 2 4 2
female_5 3 2 3 3 3
But I need this (only first row shown)
male_1 male_2 male_3 male_4 male_5
female_1 1 1 3 2 3
.
.
In reality, my dataset is not symmetrical (I have 47 females and 32 males).
Thanks for any help!!!
Set up an object to receive results:
xy <- matrix(NA, nrow(females), nrow(males))
for ( x in 1:nrow(females) ){
for(y in 1:nrow(males) ){
xy[x,y] <- sum(females[x, 1:ncol(females)] != males[y,1:ncol(males)])}}
Should have done with nested sapply calls as well and might have been a bit cleaner since there was no need to have a separate "setup", (but only a little bit cleaner, and contrary to popular myth not any faster):
xy <- sapply( 1:nrow(females) ,
function(x) sapply( 1:nrow(males) ,
function(y) sum( females[x, 1:ncol(females)] != males[y,1:ncol(males)]) ))
xy
#-----
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 2 1 1
[2,] 1 1 4 1 3
[3,] 3 5 0 3 1
[4,] 2 2 3 0 2
[5,] 3 5 0 3 1
dimnames(xy) <- list( rownames(females), rownames(males) )