Search code examples
rgroup-by

Remove/select groups based on conditions in two different columns in R


I have a dataset that looks like the one below (df), where ID are individuals clustered in two clusters. These clusters where built around a target individual (target = yes), and both targetted and non-targetted individuals could be used or not.

    Cluster ID  Target Used
    1       1   Yes    Yes 
    1       2   No     No  
    1       3   No     Yes 
    2       1   No     No  
    2       2   Yes    No  
    2       3   No     Yes 
    2       4   No     Yes 

I want to filter this database (mine is around 3000 rows long) to remove whole clusters where the target individual is used (target=yes & used=yes). My solution should look like this:

    Cluster ID Target Used
    1       1   Yes   Yes
    1       2   No    No
    1       3   No    Yes

I tried dplyr solutions, for instance:

    grouped_df <- df %>% group_by(Cluster)

    grouped_df %>%
    filter(all(yes %in% Used) & all(yes %in% Target)) %>%  distinct(Cluster)

But this code returns a dataframe that only includes rows where target=yes, and removes everything else. But I need a code that keeps the clusters together, and selects them or removes them depending on whether the target individual (target=yes) is used or not.


Solution

  • Using filter

    library(dplyr)
    
    df %>% 
      filter(any(Target == "Yes" & Used == "Yes"), .by = Cluster)
      Cluster ID Target Used
    1       1  1    Yes  Yes
    2       1  2     No   No
    3       1  3     No  Yes