Search code examples
rmatrix

Converting Presence/Absence data to co-occurrence matrix in R


I am trying to create a co-occurance matrix from presence absence data. The data structure currently measures if each species possesses each cell type (X, Y, or Z). If the cell type is present in that species, there is a 1, if it is absent in that species there is a 0.

x <- c(1, 1, 1)
y <- c(0, 1, 0)
z <- c(1, 1, 0)

df_pa_species<-data.frame(species, x, y, z)

Which produces a dataframe which looks like this:

Species           X               Y               Z
A                 1               0               1
B                 1               1               1
C                 1               0               0

I am trying to produce a matrix which will count how many times two cell types co-occur. In this example, X co-occurs with Z twice and Y once. And my ideal output would be a matrix like this:

         X       Y       Z
X       -        1       2
Y       1        -       1
Z       2        1       -

I am struggling to produce this matrix, because the numbers of 1s in each row varies by species. I have tried converting the data into this format:

db4 <- data.frame(df_pa_species[1], cols = apply(df_pa_species[-1], 1, function(x)
  paste(names(x)[x==1], collapse=",")), stringsAsFactors = FALSE)

Species     cols
A                x,z
B                x,y,z
C                x

With the ultimate goal of getting it closer to these examples I've seen: (How to calculate a (co-)occurrence matrix from a data frame with several columns using R?, Transforming matrix of presence/absence to Data.frame of vertice connection. (Removing duplicated rows with eeuqal unordered values)) However my rows do not sum up to the same number and I can't seem to get these to work with unequal rows. Any ideas?


Solution

  • You can use crossprod for this

    result <- crossprod(as.matrix(df_pa_species[,-1]))
    diag(result) <- NA
    result
    #    x  y  z
    # x NA  1  2
    # y  1 NA  1
    # z  2  1 NA