I have to produce a self-referencing variable (ind) that is grouped by an id and has to fulfill a certain condition (e.g., time >1). Here is a toy example:
set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 4), time = rep(1:4, 2), ret = rnorm(8)/100)
dt$ind <- if_else(dt$time == 1, 100, as.numeric(NA))
dt
dt <- dt %>%
group_by(id) %>%
mutate(
ind = if_else(time > 1, lag(ind, 1)*(1+ret), ind)
)
This is the output:
Obviously I cannot use mutate in this set up since it is referencing to the initial values of ind and does not update when new values are calculated.
I would like to avoid running a loop. Any ideas how I can compute ind for all time periods most efficiently?
Thanks to everyone for the helpful answers! I have a slightly trickier extension of the above issue.
How can I deal with higher lags? E.g., with lag = 2, such that
index_{t} = index_{t-2}*(1+ret_{t})
Here is a sample data frame and a sample outcome that I produced with Excel:
set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100)
dt$ind <- if_else(dt$time == 1, 120, if_else(dt$time == 2, 125, as.numeric(NA)))
As a workaround, you can use following trick in edited circumstances. Note you may change this for any number of simultaneous series
seq(n()) %% 2
set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 5), time = rep(1:5, 2), ret = rnorm(10)/100)
dt$ind <- ifelse(dt$time == 1, 120, ifelse(dt$time == 2, 125, as.numeric(NA)))
library(dplyr, warn.conflicts = F)
dt %>% group_by(id) %>%
group_by(d = seq(n()) %% 2, .add = TRUE) %>%
mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1])
#> # A tibble: 10 x 5
#> # Groups: id, d [4]
#> id time ret ind d
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 a 1 0.00554 120 1
#> 2 a 2 -0.00280 125 0
#> 3 a 3 0.0178 122. 1
#> 4 a 4 0.00187 125. 0
#> 5 a 5 0.0114 124. 1
#> 6 b 1 0.00416 120 0
#> 7 b 2 0.0123 125 1
#> 8 b 3 0.00237 120. 0
#> 9 b 4 -0.00365 125. 1
#> 10 b 5 0.0111 122. 0
purrr
library(tidyverse)
set.seed(13)
dt <- data.frame(id = rep(letters[1:2], each = 4), time = rep(1:4, 2), ret = rnorm(8)/100)
dt$ind <- if_else(dt$time == 1, 100, as.numeric(NA))
dt
#> id time ret ind
#> 1 a 1 0.005543269 100
#> 2 a 2 -0.002802719 NA
#> 3 a 3 0.017751634 NA
#> 4 a 4 0.001873201 NA
#> 5 b 1 0.011425261 100
#> 6 b 2 0.004155261 NA
#> 7 b 3 0.012295066 NA
#> 8 b 4 0.002366797 NA
dt %>% group_by(id) %>%
mutate(ind = cumprod(1 + duplicated(id) * ret)* ind[1])
#> # A tibble: 8 x 4
#> # Groups: id [2]
#> id time ret ind
#> <chr> <int> <dbl> <dbl>
#> 1 a 1 0.00554 100
#> 2 a 2 -0.00280 99.7
#> 3 a 3 0.0178 101.
#> 4 a 4 0.00187 102.
#> 5 b 1 0.0114 100
#> 6 b 2 0.00416 100.
#> 7 b 3 0.0123 102.
#> 8 b 4 0.00237 102.
Created on 2021-07-27 by the reprex package (v2.0.0)