I want to select groups that satisfy one of the two conditions: 1) contain a
and b
; 2) contain a
and c
. Here is the dataset:
ff <- data.frame(id = c(1,1,2,2,3,3,4,4), value = c("a", "a", "a", "b", "a", "c", "b", "c"))
Therefore the selected groups should be 2 and 3.
How to achieve that in an efficient way (I have a much larger dataset)?
You can groupby and filter by those two conditions a suggested
library dplyr
ff <- data.frame(id = c(1, 1, 2, 2, 3, 3, 4, 4),
value = c("a", "a", "a", "b", "a", "c", "b", "c"))
ff %>%
group_by(id) %>%
filter(all(c("a", "b") %in% value) | all(c("a", "c") %in% value)) %>%
distinct(id)
Basically, your filter is checking for all instances of a and b or a and c in the value in the group ID.
The distinct produces the unique groups from the filter code above