I have data on patient diagnosis. There are 13 columns, each with different disease type, and they all store binary values (0- no disease, 1-disease present). I want to see the coocurrence of the diseases, i.e. to see how many times disease type 1 exists in a patient who also has disease type2. My data (simplified):
ID <- sample(10:50, 20)
type1 <- sample(0:1, 20, replace = T)
type2 <- sample(0:1, 20, replace = T)
type3 <- sample(0:1, 20, replace = T)
type4 <- sample(0:1, 20, replace = T)
type5 <- sample(0:1, 20, replace = T)
type6 <- sample(0:1, 20, replace = T)
data <- cbind.data.frame(ID, type1, type2, type3, type4, type5, type6)
I am looking for something similar to the output of cor(, that would give me counts rather than frequency/correlation measure
Try this:
data_matrix <- as.matrix(data[, 2:ncol(data)])
cooccurrence <- crossprod(data_matrix)
diag(cooccurrence) <- 0
This is a symmetric matrix. Take only the upper or lower triangle.