Search code examples
rassociationscorrelationcrosstab

two-way crosstabulate of multiple columns


I have data on patient diagnosis. There are 13 columns, each with different disease type, and they all store binary values (0- no disease, 1-disease present). I want to see the coocurrence of the diseases, i.e. to see how many times disease type 1 exists in a patient who also has disease type2. My data (simplified):

ID <- sample(10:50, 20)
type1 <- sample(0:1, 20, replace = T)
type2 <- sample(0:1, 20, replace = T)
type3 <- sample(0:1, 20, replace = T)
type4 <- sample(0:1, 20, replace = T)
type5 <- sample(0:1, 20, replace = T)
type6 <- sample(0:1, 20, replace = T)

data <- cbind.data.frame(ID, type1, type2, type3, type4, type5, type6)

I am looking for something similar to the output of cor(, that would give me counts rather than frequency/correlation measure


Solution

  • Try this:

    data_matrix <- as.matrix(data[, 2:ncol(data)])
    cooccurrence <- crossprod(data_matrix)
    diag(cooccurrence) <- 0
    

    This is a symmetric matrix. Take only the upper or lower triangle.