Search code examples
rpurrraccumulate

Accumulate based on lagged values and values in different columns in R


I have the following dataset:

df <- data.frame(fractile = 1:10) %>% mutate(bracketaverage = fractile*100, topaverage = 10:19)

Which looks like:

 fractile bracketaverage topaverage
1         1            100         10
2         2            200         11
3         3            300         12
4         4            400         13
5         5            500         14
6         6            600         15
7         7            700         16
8         8            800         17
9         9            900         18
10       10           1000         19

I wish to modify the topaverage column by utilizing input from the remaining columns according to the specified formula:

topaverage = ( (100-lag(fractile) ) * lag(topaverage) + ( lag(fractile) - fractile ) * bracket_average ) / (100-fractile)

Hence, the formula calculates the topaverage recursively, meaning that each iteration relies on the topaverage from the preceding step.


Solution

  • While many of purrr's functions have func2-versions for two arguments, two factors make it not easy to use them here: you need three variables, and you need both the lag and current version of at least one of them. Because of that, I think we can purrr::accumulate over the row indices themselves.

    library(dplyr)
    library(purrr) # accumulate
    df |>
      mutate(
        val = accumulate(
          row_number(), .init = first(topaverage),
          .f = function(ta, i) if (i == 1) ta else {
            ( (100 - fractile[i-1]) * topaverage[i-1] +
                (fractile[i-1] - fractile[i]) * bracketaverage[i] ) /
              (100 - fractile[i])
          })[-1]
      )
    #    fractile bracketaverage topaverage       val
    # 1         1            100         10 10.000000
    # 2         2            200         11  8.061224
    # 3         3            300         12  8.020619
    # 4         4            400         13  7.958333
    # 5         5            500         14  7.873684
    # 6         6            600         15  7.765957
    # 7         7            700         16  7.634409
    # 8         8            800         17  7.478261
    # 9         9            900         18  7.296703
    # 10       10           1000         19  7.088889
    

    where fractile[i-1] is giving us lag(fractile) and fractile[i] is giving us the current-row fractile.

    I assume that the first topaverage is retained unmodified. If that is not true, then this might need adjusting to account for aligning in the other direction ([i] and [i+1] instead, updating the leading conditional).