Search code examples
rcumsum

Create a counter in dataframe that gets reset based on changes in a value


I have a df with an ID column and another column (Base/Promo) coding 1 or 0. I'd like to create a third, new column that contains a counter that counts if value is 0 in (Base/Promo) and gets reset to 0 if the value in (Base/Promo) is 1 and starts counting again when the value is 0. Also, the counter should be reset whenever the ID changes. Here is a reproducible df & an example of how the output should look like:

df <- data.frame(ID = c(rep("1", 6), rep("2", 6), 
rep("3", 6) rep("4", 6)), 
response=c(0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0,0,1,1,0))             
df

Desired outcome:

      ID     Base/Promo     counter
1     1         0           1
2     1         0           2
3     1         0           3
4     1         1           0
5     1         0           1
6     1         0           2
7     2         1           0
8     2         0           1
9     2         0           2
10    2         0           3
11    2         0           4
12    2         1           0
13    3         0           1
14    3         0           2
15    3         0           3
16    3         1           0
17    3         1           0
18    3         0           1
19    4         0           1
20    4         0           2
21    4         0           3
22    4         1           0
23    4         1           0
24    4         0           1

I tried with the following code:

df <- df %>%
group_by(ID, idx = cumsum("Base/Promo" == 1L)) %>%
mutate(counter= row_number()) %>%
ungroup %>%
  select(-idx)

But this gives me the following outcome:

      ID     Base/Promo     counter
1     1         0           1
2     1         0           2
3     1         0           3
4     1         1           1
5     1         0           2
6     1         0           3
7     2         1           1
8     2         0           2
9     2         0           3
10    2         0           4
11    2         0           5
12    2         1           1
13    3         0           2
14    3         0           3
15    3         0           4
16    3         1           1
17    3         1           1
18    3         0           2
19    4         0           1
20    4         0           2
21    4         0           3
22    4         1           1
23    4         1           1
24    4         0           2

I hope you can help with this problem, as I'm out of ideas of what else to try. In case I should have overlooked an SO entry that solves my problem, it would be great if you could point me there. Many thanks!!!


Solution

  • We can adjust the counter values based on first value of the group :

    library(dplyr)
    
    df %>%
      group_by(ID, grp = cumsum(response == 1L)) %>%
      mutate(counter = if(first(response) == 1L) row_number() - 1 
                       else row_number()) %>%
      ungroup() %>%
      dplyr::select(-grp)
    
    # A tibble: 24 x 3
    #   ID    response counter
    #   <chr>    <dbl>   <dbl>
    # 1 1            0       1
    # 2 1            0       2
    # 3 1            0       3
    # 4 1            1       0
    # 5 1            0       1
    # 6 1            0       2
    # 7 2            1       0
    # 8 2            0       1
    # 9 2            0       2
    #10 2            0       3
    # … with 14 more rows