Search code examples
rdplyrwindow-functions

filter all rows smaller than x with all following values also smaller than x


I am looking for a concise way to filter a data.frame for all rows smaller than a value x with all following values also smaller than x. I found a way but it is somehwat verbose. I tried to do it with dplyr::cumall and cumany, but was not able to figure it out.

Here is a small reprex including my actual approach. Ideally I would only have one filter line or mutate + filter, but with the current approach it takes two rounds of mutate/filter.

library(dplyr)

# Original data
tbl <- tibble(value = c(100,100,100,10,10,5,10,10,5,5,5,1,1,1,1))

# desired output:
# keep only rows, where value is smaller than 5 and ...
# no value after that is larger than 5

tbl %>% 
  mutate(id = row_number()) %>% 
  filter(value <= 5) %>%
  mutate(id2 = lead(id, default = max(id) + 1) - id) %>% 
  filter(id2 == 1)

#> # A tibble: 7 x 3
#>   value    id   id2
#>   <dbl> <int> <dbl>
#> 1     5     9     1
#> 2     5    10     1
#> 3     5    11     1
#> 4     1    12     1
#> 5     1    13     1
#> 6     1    14     1
#> 7     1    15     1

Created on 2020-04-20 by the reprex package (v0.3.0)


Solution

  • You could combine cummin with a reversed reverse cummax:

     tbl %>% filter(rev(cummax(rev(value))) <= 5 & cummin(value) <= 5)
    # A tibble: 7 x 1
      value
      <dbl>
    1     5
    2     5
    3     5
    4     1
    5     1
    6     1
    7     1