I have a table that lists the presence/absence of each organism across several different conditions. My goal is to generate a new table that lists the values for all possible Venn Diagrams for each pair of organisms.
...put another way: for each pair of organisms, I want a table summarizing:
My current method is below, though my real Presence/Absence table is much larger, so it'd be great if there's a more concise way to automate this! (i.e. a for-loop?!)
Example Presence/Absence Table (rows=conditions, columns=organisms):
paData <- data.table(
Pyro = c(1,1,0,0,1,0,1),
Anth = c(0,1,0,1,0,1,1),
Tric = c(1,1,0,1,0,1,1))
paData
Pyro Anth Tric
1: 1 0 1
2: 1 1 1
3: 0 0 0
4: 0 1 1
5: 1 0 0
6: 0 1 1
7: 1 1 1
For each pair of organisms (columns) designate whether one, both, or neither organism was present in each condition (row):
paData$PyroAnth <- ifelse(paData[,1] ==1 &
paData[,2] ==0, "V1alone",
ifelse(paData[,1] ==1 &
paData[,2] ==1, "Overlap",
ifelse(paData[,1] ==0 &
paData[,2] ==1, "V2alone",
"NA")))
paData$PyroTric <- ifelse(paData[,1] ==1 &
paData[,3] ==0, "V1alone",
ifelse(paData[,1] ==1 &
paData[,3] ==1, "Overlap",
ifelse(paData[,1] ==0 &
paData[,3] ==1, "V2alone",
"NA")))
paData$AnthTric <- ifelse(paData[,2] ==1 &
paData[,3] ==0, "V1alone",
ifelse(paData[,2] ==1 &
paData[,3] ==1, "Overlap",
ifelse(paData[,2] ==0 &
paData[,3] ==1, "V2alone",
"NA")))
paData
Pyro Anth Tric PyroAnth PyroTric AnthTric
1: 1 0 1 V1alone Overlap V2alone
2: 1 1 1 Overlap Overlap Overlap
3: 0 0 0 NA NA NA
4: 0 1 1 V2alone V2alone Overlap
5: 1 0 0 V1alone V1alone NA
6: 0 1 1 V2alone V2alone Overlap
7: 1 1 1 Overlap Overlap Overlap
Create desired output table -- Count the number of conditions (rows) where, for each pair of organisms; each organism was present either "alone" or where its presence "overlapped" with the presence of the second organism.
DesiredOutput <- data.frame(rbind(list(names(paData[,1]), names(paData[,2]),
nrow(paData[PyroAnth == "V1alone"]),
nrow(paData[PyroAnth == "Overlap"]),
nrow(paData[PyroAnth == "V2alone"])),
list(names(paData[,1]), names(paData[,3]),
nrow(paData[PyroTri == "V1alone"]),
nrow(paData[PyroTri == "Overlap"]),
nrow(paData[PyroTri == "V2alone"])),
list(names(paData[,2]), names(paData[,3]),
nrow(paData[AnthTri == "V1alone"]),
nrow(paData[AnthTri == "Overlap"]),
nrow(paData[AnthTri == "V2alone"]))))
colnames(DesiredOutput) <- c("V1", "V2", "V1alone", "Overlap", "V2alone")
DesiredOutput
V1 V2 V1alone Overlap V2alone
1 Pyro Anth 2 2 2
2 Pyro Tric 1 3 2
3 Anth Tric 0 4 1
How could this be automated to efficiently create my "DesiredOutput" table for dozens of organisms and hundreds of conditions?
You could try this approach:
f <- function(v1,v2) list(sum(v1 & !v2),sum(v1 & v2),sum(!v1 & v2))
result = data.table(t(combn(names(paData),2)))
result[, c("v1alone", "overlap", "v2alone"):=f(paData[[V1]], paData[[V2]]), by=1:nrow(result)]
Output:
V1 V2 v1alone overlap v2alone
1: Pyro Anth 2 2 2
2: Pyro Tric 1 3 2
3: Anth Tric 0 4 1