Search code examples
rdplyrfilterconditional-statements

How to apply conditional filtering based on group size?


I would like conditional filtering, based on the group size.

Suppose I have a dataframe, which looks like:

data1 <- data.frame(
  ID = c(1, 1, 1, 3, 3, 5, 6),
  town = c("Town A", "Town A", "Town B", "Town A", "Town C", "Town B", "Town A"),
  place = c("A", "B", "A", "B", "C", "A", "B"),
  place1 = c("A", "c", "A", "B", "C", "A", "D"),
  test = c("G", "B", "A", "B", "C", "A", "B"),
  test1 = c("G", "B", "A", "B", "d", "A", "B")

I would want to keep one town each ID, based on conditional filtering place == place1 and if the group size is still bigger than I want to filter test == test1.

I've tried something like:

data1 %>%group_by(ID) %>% 
  filter(if (n() >= 2) place == place1 else test == test1) %>% 
  filter(n() == 1) %>% 
  ungroup()

But the ifelse does not work, as group 1 and 3 are missing.


Solution

  • Sort your data by your conditions (descending, so that TRUE comes before FALSE), and then slice 1 row per group:

    data1 |>
      arrange(ID, desc(place == place1), desc(test == test1)) |>
      slice(1, .by = ID)
    #   ID   town place place1 test test1
    # 1  1 Town A     A      A    G     G
    # 2  3 Town A     B      B    B     B
    # 3  5 Town B     A      A    A     A
    # 4  6 Town A     B      D    B     B
    

    Do note that if there are ties (like rows 1 and 3 in your original data), this will probably keep the first one, but I wouldn't count on it for sure.