Search code examples
rdataframedata.tablequantile

How to detect values in table's column and subtract them to another column?


Data

I'm working with data.table and I'm trying to make a quite difficult manipulation. My data looks like this:

A <- data.table(c("a","a","b","c","d","d"),c(1,2,2,3,4,9),c(1,1,0,0,7,3))
   V1 V2 V3
1:  a  1  1
2:  a  2  1
3:  b  2  0
4:  c  3  0
5:  d  4  7
6:  d  9  3

What I want to do

I need to detect when there is a value >0 in V3. When it's the case, I need to sum values in V3 that are successively >0 and then subtract them to the sum of corresponding rows in V2, then if the result is lower than the 20% quantile of the sum of corresponding rows in V2, then corresponding rows in V2 get 0, else, split the result in among the corresponding rows in V2.

Example

In V3, we have the 2 first row >0. We sum it -> 2, the sum of corresponding row is 3. We subtract it, we obtain 1 which is >= to the 20% quantile so we split it and same for below. The result might be:

   V1  V2 V3
1:  a 0.5  1
2:  a 0.5  1
3:  b 2.0  0
4:  c 3.0  0
5:  d 1.5  7
6:  d 1.5  3

EDIT

When there is a 0, the group of value is finished. I mean, the first group is the first 2 rows, containing 1 and 1 in V3, then there is 0 so the group is finished. Then there is another 0 so we don't take it and then, there is a value >0 which means that another group starts here until another 0.


Solution

  • Assuming that V1 is also reflecting the locations where V3 > 0, you can use V1 in your grouping by.

    A[V3 > 0, V2 := ifelse( (sum(V2) - sum(V3)) < 0.2*sum(V2), 0, (sum(V2) - sum(V3)) / .N),
        by=.(V1)]
    

    Or if V1 cannot be relied upon, you can create a V1-like grouping on the fly as follows, using rle

    A[, 
        V2 := ifelse(V3 <= 0, V2,
            ifelse(sum(V2) - sum(V3) < 0.2*sum(V2), 0, (sum(V2) - sum(V3)) / .N)),
        by=.(rep(seq_along(rle(V3 > 0)$values), rle(V3 > 0)$lengths))]