I am trying to achieve a result in R as follows: I have a dataset, like below
and I'm trying to achieve result as follows:
Conditions are:
Constraints: Do not want to use nested for loops. Is there a better way to achieve the result in R.
Thank you in advance for the suggestions/solutions.
It's much easier to help if your data is reproducible. The data in the images doesn't have column names, but the following code reproduces the data frame and names the columns col1
, col2
and col3
:
df <- data.frame(col1 = rep(1:17, 2),
col2 = rep(c("A", "B"), each = 17),
col3 = rep(rep(c(FALSE, TRUE), 4),
times = c(2, 10, 3, 4, 5, 3, 3, 4)))
To do what you need, we can group the data according to each run of TRUE or FALSE in column 3 using rleid
from data.table
, then insert a rev
erse integer seq
uence for the FALSE segments. Finally, ungroup
, then shift the new column up one position using lead
.
library(tidyverse)
df <- df %>%
group_by(grp = data.table::rleid(col3)) %>%
mutate(col4 = rev(seq(n())) * (1 - col3)) %>%
group_by(col2) %>%
mutate(col4 = lead(col4, default = 0)) %>%
select(-grp)
The result matches your expected output:
print(df, n = 34)
#> # A tibble: 34 x 4
#> col1 col2 col3 col4
#> <int> <chr> <lgl> <dbl>
#> 1 1 A FALSE 1
#> 2 2 A FALSE 0
#> 3 3 A TRUE 0
#> 4 4 A TRUE 0
#> 5 5 A TRUE 0
#> 6 6 A TRUE 0
#> 7 7 A TRUE 0
#> 8 8 A TRUE 0
#> 9 9 A TRUE 0
#> 10 10 A TRUE 0
#> 11 11 A TRUE 0
#> 12 12 A TRUE 3
#> 13 13 A FALSE 2
#> 14 14 A FALSE 1
#> 15 15 A FALSE 0
#> 16 16 A TRUE 0
#> 17 17 A TRUE 0
#> 18 1 B TRUE 0
#> 19 2 B TRUE 5
#> 20 3 B FALSE 4
#> 21 4 B FALSE 3
#> 22 5 B FALSE 2
#> 23 6 B FALSE 1
#> 24 7 B FALSE 0
#> 25 8 B TRUE 0
#> 26 9 B TRUE 0
#> 27 10 B TRUE 3
#> 28 11 B FALSE 2
#> 29 12 B FALSE 1
#> 30 13 B FALSE 0
#> 31 14 B TRUE 0
#> 32 15 B TRUE 0
#> 33 16 B TRUE 0
#> 34 17 B TRUE 0
Created on 2022-09-07 with reprex v2.0.2