Search code examples
rdplyrfilterdata-cleaning

Select groups that satisfy one of the two conditions using dplyr


I want to select groups that satisfy one of the two conditions: 1) contain a and b; 2) contain a and c. Here is the dataset:

ff <- data.frame(id = c(1,1,2,2,3,3,4,4), value = c("a", "a", "a", "b", "a", "c", "b", "c"))

Therefore the selected groups should be 2 and 3.

How to achieve that in an efficient way (I have a much larger dataset)?


Solution

  • You can groupby and filter by those two conditions a suggested

    library dplyr
    
    ff <- data.frame(id = c(1, 1, 2, 2, 3, 3, 4, 4), 
                     value = c("a", "a", "a", "b", "a", "c", "b", "c"))
    
    ff %>%
      group_by(id) %>%
      filter(all(c("a", "b") %in% value) | all(c("a", "c") %in% value)) %>%
      distinct(id)
    

    Basically, your filter is checking for all instances of a and b or a and c in the value in the group ID.

    The distinct produces the unique groups from the filter code above