Search code examples
rfor-loopif-statementlagcumsum

Continual summation of a column in R until condition is met


I am doing my best to learn R, and this is my first post on this forum.

I currently have a data frame with a populated vector "x" and an unpopulated vector "counter" as follows:

x <- c(NA,1,0,0,0,0,1,1,1,1,0,1)

df <- data.frame("x" = x, "counter" = 0)

    x counter
1  NA       0
2   1       0
3   0       0
4   0       0
5   0       0
6   0       0
7   1       0
8   1       0
9   1       0
10  1       0
11  0       0
12  1       0

I am having a surprisingly difficult time trying to write code that will simply populate counter so that counter sums the cumulative, sequential 1s in x, but reverts back to zero when x is zero. Accordingly, I would like counter to calculate as follows per the above example:

    x counter
1  NA       NA
2   1       1
3   0       0
4   0       0
5   0       0
6   0       0
7   1       1
8   1       2
9   1       3
10  1       4
11  0       0
12  1       1

I have tried using lag() and ifelse(), both with and without for loops, but seem to be getting further and further away from a workable solution (while lag got me close, the figures were not calculating as expected....my ifelse and for loops eventually ended up with length 1 vectors of NA_real_, NA or 1). I have also considered cumsum - but not sure how to frame the range to just the 1s - and have searched and reviewed similar posts, for example How to add value to previous row if condition is met; however, I still cannot figure out what I would expect to be a very simple task.

Admittedly, I am at a low point in my early R learning curve and greatly appreciate any help and constructive feedback anyone from the community can provide. Thank you.


Solution

  • You can use :

    library(dplyr)
    
    df %>%
      group_by(x1 = cumsum(replace(x, is.na(x), 0) == 0)) %>%
      mutate(counter = (row_number() - 1) * x) %>%
      ungroup %>%
      select(-x1)
    
    #       x counter
    #   <dbl>   <dbl>
    # 1    NA      NA
    # 2     1       1
    # 3     0       0
    # 4     0       0
    # 5     0       0
    # 6     0       0
    # 7     1       1
    # 8     1       2
    # 9     1       3
    #10     1       4
    #11     0       0
    #12     1       1
    

    Explaining the steps -

    • Create a new column (x1), replace NA in x with 0 and increment the group value by 1 (using cumsum) whenever x = 0.
    • For each group subtract the row number with 0 and multiply it by x. This multiplication is necessary because it will help to keep counter as 0 where x = 0 and counter as NA where x is NA.