Search code examples
rcumsum

Count the number of NA values in a row - reset when 0


I encountered the question: "Cumulative sum that resets when 0 is encountered" via https://stackoverflow.com/a/32502162/13269143 , which partially, but not fully, answered my question. I first wanted to create a column that, row-wise, accumulates the values of each sequence in column b that is separated by a 0. This I achieved by using the code:

setDT(df)[, whatiwant := cumsum(b), by = rleid(b == 0L)]

as suggested in https://stackoverflow.com/a/32502162/13269143 (the other solutions provided did not work for me. They only produced NA values.) Now, I wish to also create a third column, "What I Want" in the illustration, that assigns the maximum aggregated value of the accumulated value for a given sequence to each observation in that particular sequence. Let me illustrate,

b     Accumulated   What I Want
1      1            3
1      2            3
1      3            3
0      0            0
1      1            4
1      2            4
1      3            4
1      4            4
0      0            0
0      0            0
0      0            0
1      1            2
1      2            2

There might be a very simple way to do this. Thank you in advance.


Solution

  • You can use max instead of cumsum in your attempt :

    library(data.table)
    setDT(df)[, whatiwant := max(Accumulated), by = rleid(b == 0L)]
    df
    
    #    b Accumulated whatiwant
    # 1: 1           1         3
    # 2: 1           2         3
    # 3: 1           3         3
    # 4: 0           0         0
    # 5: 1           1         4
    # 6: 1           2         4
    # 7: 1           3         4
    # 8: 1           4         4
    # 9: 0           0         0
    #10: 0           0         0
    #11: 0           0         0
    #12: 1           1         2
    #13: 1           2         2