Search code examples
rdplyr

subset dplyr dataframe with custom rule


I have a dataframe like the following:

df <- data.frame(num = c(1, 2, 4, 5, 7, 9, 10), value = c('a', 'b', 'c', 'd', 'e', 'f', 'g'))

I would like to subset the dataframe by rows that are continuous (serial) without break. My output should look like the following:

    num value
1     1     a
2     2     b
3     4     c
4     5     d
5     9     f
6    10     g

With the code below,

df_subset = df %>% 
  mutate(difference = num - lag(num, default = first(num))) %>%
  filter(difference ==1 | row_number() ==1)

The output excludes 4 & 9

   num value
1     1     a
2     2     b
3     5     d
4    10     g

because the value of difference is not 1. How to modify this to create the groups with series?


Solution

  • You could use diff twice instead of lags:

    df %>%
      filter(c(1, diff(num))==1 | c(diff(num), NA)==1)
    

      num value
    1   1     a
    2   2     b
    3   4     c
    4   5     d
    5   9     f
    6  10     g