My data look like this
set.seed(89)
d <- data.frame(
ID=seq(1, 100),
Encounter=sample(c(1:50), 100, replace = TRUE),
EffortType=sample(c("A","B","C"), 100, replace = TRUE)
)
I consider the Encounter variable as a factor.
I would like to know the frequencies of the possible combinations of EffortType.
I would like the results to look something like this
EffortType N
A 8
B 8
C 9
A,B 4
A,C 8
B,C 5
A,B,C 3
I would also like to then be able to subset the data by the EffortType combinations. For example, I would end up with a subset for EffortType A,B that looks something like this
ID Encounter EffortType
52 2 A
53 2 B
61 2 A
63 2 A
79 2 A
36 7 B
59 7 B
83 7 A
etc.
I did try to reshape the data such that I had separate variables for each level of EffortType using "mutate", and then tried to count up the instances of each combination, but am not doing this correctly as shown below. I can't figure out how to "group" by encounter before doing the counting.
d = mutate(d,
A = ifelse(grepl("A", EffortType), T, F),
B = ifelse(grepl("B", EffortType), T, F),
C = ifelse(grepl("C", EffortType), T, F))
d = data.table(d)
d[, .N, by = c('Encounter', 'A', 'B', 'C')]
But I don't end up with the summary I'm hoping for. Please help. Thx.
I would make a separate table for encounter attributes:
library(data.table)
EncounterDT = d[,
.(tt = paste(sort(unique(EffortType)), collapse=" "))
, keyby=Encounter]
# count encounters by types
EncounterDT[, .N, keyby=tt][order(nchar(tt), tt)]
# subset d using a join
d[EncounterDT[tt == "A B", .(Encounter)], on=.(Encounter)]
If you have a strong preference for using a single table, though...
# add a repeating-value column
d[, tt := paste(sort(unique(EffortType)), collapse=" "), by=Encounter]
# count encounters by types
d[, uniqueN(Encounter), keyby=tt][order(nchar(tt), tt)]
# subset d based using the tt column
d[tt == "A B"]