Search code examples
rfilterdplyrcase-when

Filter out rows when condition is met with dplyr


How to filter out rows in one column when condition is met in another column for different groups?

For example:

library(dplyr)

df1 <-tribble(
   ~group, ~var1, ~var2,
  "a", 0, 0,
  "a", 1, 0,
  "a",1, 0,
  "a",0, 1,
  "a", 1, 0,
  "b", 1, 0,
  "b", 0, 1,
  "b", 1, 0,
  "b", 0, 1)

I want to allow ones in var1 only after having the first 1 in var2. Therefore, in this example, I would like to get:

group  var1  var2
  <chr> <dbl> <dbl>
 a         0     0
 a         0     1
 a         1     0
 b         0     1
 b         1     0
 b         0     1

I can identify from where I want to start filtering the data, but don't know exactly how to proceed:

df1 %>% 
  group_by(var2,group) %>% 
  mutate(test = case_when(row_number() == 1 & var2 == 1 ~ "exclude_previous_rows",
                         T ~ "n"))

I'm sure there is a simple way to do this with dplyr, but couldn't find it so far.


Solution

  • We can use a cumulative sum. I think this is what you want:

    df1 %>%
      group_by(group) %>%
      filter(cumsum(var2 == 1) > 0)
    # # A tibble: 5 x 3
    # # Groups:   group [2]
    #   group  var1  var2
    #   <chr> <dbl> <dbl>
    # 1 a         0     1
    # 2 a         1     0
    # 3 b         0     1
    # 4 b         1     0
    # 5 b         0     1
    

    This will keep all rows including and after the first 1 in var2, by group. I'm not really sure what you mean by "I want to allow ones in var1" - your code seems to ignore var1, and mine follows suit.