rdataframedata.tablequantile# How to detect values in table's column and subtract them to another column?

**Data**

I'm working with data.table and I'm trying to make a quite difficult manipulation. My data looks like this:

```
A <- data.table(c("a","a","b","c","d","d"),c(1,2,2,3,4,9),c(1,1,0,0,7,3))
V1 V2 V3
1: a 1 1
2: a 2 1
3: b 2 0
4: c 3 0
5: d 4 7
6: d 9 3
```

**What I want to do**

I need to detect when there is a value >0 in V3. When it's the case, I need to sum values in V3 that are successively >0 and then subtract them to the sum of corresponding rows in V2, then if the result is lower than the 20% quantile of the sum of corresponding rows in V2, then corresponding rows in V2 get 0, else, split the result in among the corresponding rows in V2.

**Example**

In V3, we have the 2 first row >0. We sum it -> 2, the sum of corresponding row is 3. We subtract it, we obtain 1 which is >= to the 20% quantile so we split it and same for below. The result might be:

```
V1 V2 V3
1: a 0.5 1
2: a 0.5 1
3: b 2.0 0
4: c 3.0 0
5: d 1.5 7
6: d 1.5 3
```

**EDIT**

When there is a 0, the group of value is finished. I mean, the first group is the first 2 rows, containing 1 and 1 in V3, then there is 0 so the group is finished. Then there is another 0 so we don't take it and then, there is a value >0 which means that another group starts here until another 0.

Solution

Assuming that V1 is also reflecting the locations where V3 > 0, you can use V1 in your grouping by.

```
A[V3 > 0, V2 := ifelse( (sum(V2) - sum(V3)) < 0.2*sum(V2), 0, (sum(V2) - sum(V3)) / .N),
by=.(V1)]
```

Or if V1 cannot be relied upon, you can create a V1-like grouping on the fly as follows, using `rle`

```
A[,
V2 := ifelse(V3 <= 0, V2,
ifelse(sum(V2) - sum(V3) < 0.2*sum(V2), 0, (sum(V2) - sum(V3)) / .N)),
by=.(rep(seq_along(rle(V3 > 0)$values), rle(V3 > 0)$lengths))]
```

- Join lines in R plot based in order in data frame
- How to correctly install rpy2 in python?
- bslib theme not coloring page_navbar as expected
- How can I generate a random number with restrictions?
- subset of data.frame columns to maximize "complete" observations
- can't execute rsDriver (connection refused)
- Select the data.frame with maximum dimensions from a list of data.frames
- nTrials must be be greater.... issue on conjoint design
- error in `na_if()`: ! Can't convert `y` <character> to match type of `x` <double>
- Getting errors installing glmnet package using R studio
- R plotly show only labels in the stacked bar chart where percentage value for the stack is above 5
- Is there a way to add superscript and subscript to the same element using ggtext,<sub>&<sup>?
- ggplot labels with "K" for thousands or "M" for millions (maintaining 'comma' y-axis labeling)
- Automated procedure to draw ray in ggplot
- group_by and pivot_wider combination
- lm(): What is qraux returned by QR decomposition in LINPACK / LAPACK
- How to apply different functions for different groups and mutate?
- R studio time column is between
- Optimizing nested ifelse in shiny
- Expanding dataset rows based on intervals in R
- Free scales for facets in facet_grid change pattern
- Sort a factor based on value in one or more other columns
- SQL query on arrow duckdb workflow in R
- Rolling left join with value fill for all the rows in the left table
- GLM object in R takes more RAM than the object size of the GLM object
- Can you set an inferior limit to scale breaks on ggplot with facet_wrap?
- How to plot a continuous color wheel with specific colors in ggplot2
- Retain unused factor levels in legend when using scale_*_manual
- Check values of column 1 present in values of column 2
- Conditional panel based on a selectInput with emoji