Search code examples
rdplyrdata-cleaningdata-wrangling

How to remove rows leading up to a particular value?


In the following data, for each id, I would like to remove rows after the first 1 is reached. My data is as follows:

 id x
  a 0
  a 0
  a 1
  a 0
  a 1
  b 0
  b 1
  b 1
  b 0

The desired output:

 id x
  a 0
  a 0
  a 1
  b 0
  b 1

Code to reproduce data:

df <- structure(list(id = c("a", "a", "a", "a", "a", "b", "b", "b", 
"b"), x = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-9L))

This is not a duplicate question, although similar to this. I used cumall() to remove all rows after the first 1:

df <- df%>%
  group_by(id) %>%
  filter(cumall(!(x == 1))) %>%
  ungroup()

But the caveat here is that I want to include the row with the first 1 as well. Any help is appreciated, preferably using dplyr!


Solution

  • df %>% 
        group_by(id) %>%
        mutate(y = cumall(lag(!x))) %>%
        filter(is.na(y)) %>%
        select(-y)
    
      id        x
      <chr> <int>
    1 a         0
    2 a         0
    3 a         1
    4 b         0
    5 b         1