Search code examples
rpurrrlagself-referenceaccumulate

How to produce a self-referencing variable in R (e.g., index levels given returns)?


I have to produce a self-referencing variable (ind) that is grouped by an id and has to fulfill a certain condition (e.g., time >1). Here is a toy example:

set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 4), time = rep(1:4, 2), ret = rnorm(8)/100)
dt$ind <- if_else(dt$time == 1, 100, as.numeric(NA))
dt

dt <- dt %>%
  group_by(id) %>%
  mutate(
    ind = if_else(time > 1, lag(ind, 1)*(1+ret), ind)
  )

This is the output:

Values for ind missing

Obviously I cannot use mutate in this set up since it is referencing to the initial values of ind and does not update when new values are calculated.

I would like to avoid running a loop. Any ideas how I can compute ind for all time periods most efficiently?


Edit:

Thanks to everyone for the helpful answers! I have a slightly trickier extension of the above issue.

How can I deal with higher lags? E.g., with lag = 2, such that

index_{t} = index_{t-2}*(1+ret_{t})

Here is a sample data frame and a sample outcome that I produced with Excel:

set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100)
dt$ind <- if_else(dt$time == 1, 120, if_else(dt$time == 2, 125, as.numeric(NA)))

enter image description here


Solution

  • As a workaround, you can use following trick in edited circumstances. Note you may change this for any number of simultaneous series

    • I just added an extra group_by statement based on a modulo sequence of required number of variables using seq(n()) %% 2
    set.seed(13)
    dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100)
    dt$ind <- ifelse(dt$time == 1, 120, ifelse(dt$time == 2, 125, as.numeric(NA)))
    library(dplyr, warn.conflicts = F)
    
    dt %>% group_by(id) %>%
      group_by(d = seq(n()) %% 2, .add = TRUE) %>%
      mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1])
    #> # A tibble: 10 x 5
    #> # Groups:   id, d [4]
    #>    id     time      ret   ind     d
    #>    <chr> <int>    <dbl> <dbl> <dbl>
    #>  1 a         1  0.00554  120      1
    #>  2 a         2 -0.00280  125      0
    #>  3 a         3  0.0178   122.     1
    #>  4 a         4  0.00187  125.     0
    #>  5 a         5  0.0114   124.     1
    #>  6 b         1  0.00416  120      0
    #>  7 b         2  0.0123   125      1
    #>  8 b         3  0.00237  120.     0
    #>  9 b         4 -0.00365  125.     1
    #> 10 b         5  0.0111   122.     0
    

    OLD answer: Without using purrr

    library(tidyverse)
    
    set.seed(13)
    dt <- data.frame(id = rep(letters[1:2], each = 4), time = rep(1:4, 2), ret = rnorm(8)/100)
    dt$ind <- if_else(dt$time == 1, 100, as.numeric(NA))
    dt
    #>   id time          ret ind
    #> 1  a    1  0.005543269 100
    #> 2  a    2 -0.002802719  NA
    #> 3  a    3  0.017751634  NA
    #> 4  a    4  0.001873201  NA
    #> 5  b    1  0.011425261 100
    #> 6  b    2  0.004155261  NA
    #> 7  b    3  0.012295066  NA
    #> 8  b    4  0.002366797  NA
    
    dt %>% group_by(id) %>%
      mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1])
    #> # A tibble: 8 x 4
    #> # Groups:   id [2]
    #>   id     time      ret   ind
    #>   <chr> <int>    <dbl> <dbl>
    #> 1 a         1  0.00554 100  
    #> 2 a         2 -0.00280  99.7
    #> 3 a         3  0.0178  101. 
    #> 4 a         4  0.00187 102. 
    #> 5 b         1  0.0114  100  
    #> 6 b         2  0.00416 100. 
    #> 7 b         3  0.0123  102. 
    #> 8 b         4  0.00237 102.
    

    Created on 2021-07-27 by the reprex package (v2.0.0)