Search code examples
rdata.tablecounter

incremental counter within dataframe only when a condition is met in r


I would like to create an accumulative incremental counter that increases only when a condition is met.

DT <- data.table(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2), 
               b = c(10L, 5L, 3L, 4L, 2L, 6L, 1L, 3L, 5L, 7L))

I don't get the desired result with rleid because when two conditions are met in consecutive rows, the increment is not performed

> DT[,count := rleid(b>=5),id]
> DT
    id  b count
 1:  1 10     1
 2:  1  5     1
 3:  1  3     2
 4:  1  4     2
 5:  1  2     2
 6:  1  6     3
 7:  1  1     4
 8:  2  3     1
 9:  2  5     2
10:  2  7     2

The expected result is

> DT
    id  b count
 1:  1 10     1
 2:  1  5     2
 3:  1  3     2
 4:  1  4     2
 5:  1  2     2
 6:  1  6     3
 7:  1  1     3
 8:  2  3     1
 9:  2  5     2
10:  2  7     3

Solution

  • Here is an option with cumsum. Grouped by 'id', get the cumulative sum of logical expression (b >= 5). For 'id' 2, the first element that is greater than or equal to 5 is at position 2 (in the grouped position), thus the first row will be 0. Inorder to make this 1, an option is to convert it to factor and then coerce to integer so that we get the integer storage values (R indexing starts from 1)

    DT[, count := as.integer(factor(cumsum(b >= 5))), id]
    

    -output

    DT
        id  b count
     1:  1 10     1
     2:  1  5     2
     3:  1  3     2
     4:  1  4     2
     5:  1  2     2
     6:  1  6     3
     7:  1  1     3
     8:  2  3     1
     9:  2  5     2
    10:  2  7     3