Search code examples
rdplyrpurrrrolling-computationaccumulate

Using accumulate function with second to last value as .init argument


I have recently come across an interesting question of calculating a vector values using its penultimate value as .init argument plus an additional vector's current value. Here is the sample data set:

set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100)
dt$ind <- if_else(dt$time == 1, 120, if_else(dt$time == 2, 125, as.numeric(NA)))

   id time          ret ind
1   a    1  0.005543269 120
2   a    2 -0.002802719 125
3   a    3  0.017751634  NA
4   a    4  0.001873201  NA
5   a    5  0.011425261  NA
6   b    1  0.004155261 120
7   b    2  0.012295066 125
8   b    3  0.002366797  NA
9   b    4 -0.003653828  NA
10  b    5  0.011051443  NA

What I would like to calculate is:

ind_{t} = ind_{t-2}*(1+ret_{t})

I tried the following code. Since .init is of no use here I tried the nullify the original .init and created a virtual .init but unfortunately it won't drag the newly created values (from third row downward) into calculation:

dt %>%
  group_by(id) %>%
  mutate(ind = c(120, accumulate(3:n(), .init = 125, 
                                 ~ .x * 1/.x * ind[.y - 2] * (1 + ret[.y]))))

# A tibble: 10 x 4
# Groups:   id [2]
   id     time      ret   ind
   <chr> <int>    <dbl> <dbl>
 1 a         1  0.00554  120 
 2 a         2 -0.00280  125 
 3 a         3  0.0178   122.
 4 a         4  0.00187  125.
 5 a         5  0.0114    NA 
 6 b         1  0.00416  120 
 7 b         2  0.0123   125 
 8 b         3  0.00237  120.
 9 b         4 -0.00365  125.
10 b         5  0.0111    NA 

I was wondering if there was a tweak I could make to this code and make it work completely. I would appreciate your help greatly in advance


Solution

  • Use a state vector consisting of the current value of ind and the prior value of ind. That way the prior state contains the second prior value of ind. We encode that into complex values with the real part equal to ind and the imaginary part equal to the prior value of ind. At the end we take the real part.

    library(dplyr)
    library(purrr)
    
    dt %>%
      group_by(id) %>%
      mutate(result = c(ind[1],
                        Re(accumulate(.x = tail(ret, -2), 
                                      .f = ~ Im(.x) * (1 + .y) + Re(.x) * 1i,
                                      .init = ind[2] + ind[1] * 1i)))) %>%
      ungroup
    

    giving:

    # A tibble: 10 x 5
       id     time      ret   ind result
       <chr> <int>    <dbl> <dbl>  <dbl>
     1 a         1  0.00554   120   120 
     2 a         2 -0.00280   125   125 
     3 a         3  0.0178     NA   122.
     4 a         4  0.00187    NA   125.
     5 a         5  0.0114     NA   124.
     6 b         1  0.00416   120   120 
     7 b         2  0.0123    125   125 
     8 b         3  0.00237    NA   120.
     9 b         4 -0.00365    NA   125.
    10 b         5  0.0111     NA   122.
    

    Variation

    This variation eliminates the complex numbers and uses a vector of 2 elements in place of each complex number with the first number corresponding to the real part in the prior solution and the second number of each pair corresponding to the imaginary part. This could be extended to cases where we need more than 2 numbers per state and where the dependence involves all of the last N values but for the question here there is the downside of the extra line of code to extract the result from the list of pairs of numbers which is more involved than using Re in the prior solution.

    dt %>%
      group_by(id) %>%
      mutate(result = c(ind[1],
                        accumulate(.x = tail(ret, -2), 
                                   .f = ~ c(.x[2] * (1 + .y), .x[1]),
                                   .init = ind[2:1])),
             result = map_dbl(result, first)) %>%
      ungroup
    

    Check

    We check that the results above are correct. Alternately this could be used as a straight forward solution.

    calc <- function(ind, ret) {
      for(i in seq(3, length(ret))) ind[i] <- ind[i-2] * (1 + ret[i])
      ind
    }
    
    dt %>%
      group_by(id) %>%
      mutate(result = calc(ind, ret)) %>%
      ungroup
    

    giving:

    # A tibble: 10 x 5
       id     time      ret   ind result
       <chr> <int>    <dbl> <dbl>  <dbl>
     1 a         1  0.00554   120   120 
     2 a         2 -0.00280   125   125 
     3 a         3  0.0178     NA   122.
     4 a         4  0.00187    NA   125.
     5 a         5  0.0114     NA   124.
     6 b         1  0.00416   120   120 
     7 b         2  0.0123    125   125 
     8 b         3  0.00237    NA   120.
     9 b         4 -0.00365    NA   125.
    10 b         5  0.0111     NA   122.