I have a dataset that looks like the one below (df), where ID are individuals clustered in two clusters. These clusters where built around a target individual (target = yes), and both targetted and non-targetted individuals could be used or not.
Cluster ID Target Used
1 1 Yes Yes
1 2 No No
1 3 No Yes
2 1 No No
2 2 Yes No
2 3 No Yes
2 4 No Yes
I want to filter this database (mine is around 3000 rows long) to remove whole clusters where the target individual is used (target=yes & used=yes). My solution should look like this:
Cluster ID Target Used
1 1 Yes Yes
1 2 No No
1 3 No Yes
I tried dplyr solutions, for instance:
grouped_df <- df %>% group_by(Cluster)
grouped_df %>%
filter(all(yes %in% Used) & all(yes %in% Target)) %>% distinct(Cluster)
But this code returns a dataframe that only includes rows where target=yes, and removes everything else. But I need a code that keeps the clusters together, and selects them or removes them depending on whether the target individual (target=yes) is used or not.
Using filter
library(dplyr)
df %>%
filter(any(Target == "Yes" & Used == "Yes"), .by = Cluster)
Cluster ID Target Used
1 1 1 Yes Yes
2 1 2 No No
3 1 3 No Yes