I have a dataframe like the following:
df <- data.frame(num = c(1, 2, 4, 5, 7, 9, 10), value = c('a', 'b', 'c', 'd', 'e', 'f', 'g'))
I would like to subset the dataframe by rows that are continuous (serial) without break. My output should look like the following:
num value
1 1 a
2 2 b
3 4 c
4 5 d
5 9 f
6 10 g
With the code below,
df_subset = df %>%
mutate(difference = num - lag(num, default = first(num))) %>%
filter(difference ==1 | row_number() ==1)
The output excludes 4 & 9
num value
1 1 a
2 2 b
3 5 d
4 10 g
because the value of difference is not 1. How to modify this to create the groups with series?
You could use diff
twice instead of lags:
df %>%
filter(c(1, diff(num))==1 | c(diff(num), NA)==1)
num value
1 1 a
2 2 b
3 4 c
4 5 d
5 9 f
6 10 g