Search code examples
rdataframedplyrpurrraccumulate

Use purrr::accumulate with condition


Below is a reproducible example with a test dataframe:

dat<- structure(list(A = c(1.3, 1.5, 1.6, 1.2, 1.1, 1.2), 
                     B = c(0.25, 0.21, 0.21, 0.15, 0.26, 0.17)), 
                class = c("tbl_df", "tbl", "data.frame"), 
                row.names = c(NA, -6L))

I want to do add a column with an initial value, say 1000, and then use accumulate but depending on a condition on column A.

Basically, if value in column A is superior or equal to 1.2; Column C = 1000-(Column B * 1000). But this is for the first row only. For the rest, instead of 1000 (initial value), it should be previous value.

My desired output would be something like this :

A B C
1.3 0.25 750
1.5 0.21 592.5
1.6 0.21 468.075
1.2 0.15 397.864
1.1 0.26 397.864
1.2 0.17 330.227

(and so the first row should be 1000 if column A is inferior to 1.2).

I've tried to use accumulate but I can't make the conditional part work:

dat<-dat %>%
  mutate(C = accumulate(tail(B,-1), .f = ~ .x - (.x * .y), .init = 1000))

Solution

  • If you would like to use accumulate

    > dat %>%
    +     mutate(C = accumulate(B * (A >= 1.2), ~ .x * (1 - .y), .init = 1000)[-1])
    # A tibble: 6 × 3
          A     B     C
      <dbl> <dbl> <dbl>
    1   1.3  0.25  750
    2   1.5  0.21  592.
    3   1.6  0.21  468.
    4   1.2  0.15  398.
    5   1.1  0.26  398.
    6   1.2  0.17  330.
    

    With base R, you can do the same if you use Reduce or cumprod

    transform(
        dat,
        C = Reduce(\(x, y) x * (1 - y),
            B * (A >= 1.2),
            init = 1000,
            accumulate = TRUE
        )[-1]
    )
    

    or

    transform(
        dat,
        C = cumprod(1 - B * (A >= 1.2)) * 1000
    )
    

    which gives

        A    B        C
    1 1.3 0.25 750.0000
    2 1.5 0.21 592.5000
    3 1.6 0.21 468.0750
    4 1.2 0.15 397.8638
    5 1.1 0.26 397.8638
    6 1.2 0.17 330.2269
    

    data

    > dput(dat)
    structure(list(A = c(1.3, 1.5, 1.6, 1.2, 1.1, 1.2), B = c(0.25, 
    0.21, 0.21, 0.15, 0.26, 0.17)), row.names = c(NA, -6L), class = c("tbl_df",
    "tbl", "data.frame"))