Search code examples
rdplyr

Find a row with different value from the previous row in R dplyr


I have data frame, x is row number by group, value variable, and change is the different from the previous row.

I would like to create a group variable, if change variable is different from the previous variable, we set the row number, but if it's the same, we set the row number from the first value the same.

df <- data.frame(x = c(1:11),
           value = c(0, 3, 1, 1, 3, 1, 2, 0, 0, 0, 0),
           change = c(0, -3, 2, 2, -3, 2, -1, 0, 0, 0, 0))

> df
    x value change
1   1     0      0
2   2     3     -3
3   3     1      2
4   4     1      2
5   5     3     -3
6   6     1      2
7   7     2     -1
8   8     0      0
9   9     0      0
10 10     0      0
11 11     0      0

Here is the desire df

> df <- data.frame(x = c(1:11),
+            value = c(0, 3, 1, 1, 3, 1, 2, 0, 0, 0, 0),
+            change = c(0, -3, 2, 2, -3, 2, -1, 0, 0, 0, 0),
+            group = c(1, 2, 3, 3, 5, 6, 7, 8, 8, 8, 8))
> df
    x value change group
1   1     0      0     1
2   2     3     -3     2
3   3     1      2     3
4   4     1      2     3
5   5     3     -3     5
6   6     1      2     6
7   7     2     -1     7
8   8     0      0     8
9   9     0      0     8
10 10     0      0     8
11 11     0      0     8


Solution

  • 1) Use consecutive_id to assign increasing values to successive groups and using those use match to fill each group with the row number of the first element of that group.

    library(dplyr)
    
    make_group <- function (x) {
      g <- consecutive_id(x)
      match(g, g)
    }
    
    df %>%
      mutate(group = make_group(change))
    

    giving

        x value change group
    1   1     0      0     1
    2   2     3     -3     2
    3   3     1      2     3
    4   4     1      2     3
    5   5     3     -3     5
    6   6     1      2     6
    7   7     2     -1     7
    8   8     0      0     8
    9   9     0      0     8
    10 10     0      0     8
    11 11     0      0     8
    

    2) In this approach we group row_number by consecutive_id and take the first. The output is the same as (1).

    df %>%
      mutate(group = ave(row_number(), consecutive_id(change), FUN = first))
    

    2a) This variation of (2) uses .by= in mutate. We note that x is the row number in df. This gives the same answer again:

    df %>%
      mutate(consec = consecutive_id(change)) %>%
      mutate(group = first(x), .by = consec) %>%
      select(-consec)