Search code examples
rlagdplyr

Generating an index of values to compound, but ignoring initial value in a scalable way


I am trying to generate an index that is derived from return data.

The new column I want to generate would be derived by taking 100 and then compounding this. For this example:

first value = 100

second value = first value*(1+10/100) = 110

third value = second value *(1+20/100) = 132 and so on

Note that the first return value of 5 needs to be ignored. I know I can just overwrite it to accomplish what I want, but I was wondering if there is a more elegant way to get the desired output.

I come close to getting what I want, but I need to ignore the first return of 5. The expected output of the new_col would be 100, 110, 132.

**Reproducible example**

    # Load package
    library(tidyverse)

    # Create data
    df <- data.frame(asset = c("A", "A", "A"), return = c(5,10,20))
    df

# Generate new column
test <- df %>%
  mutate(new_col = 100) %>%   #initialize
  mutate(new_col = ifelse(row_number(new_col) == 1,
                          new_col,
                          lag(new_col, 1) * cumprod((1 + return/100))
                          )
         )
test

Thanks in advance!


Solution

  • An option would be accumulate

    library(tidyverse)
    df %>% 
       mutate(newcol = accumulate(return[-1], ~ .x* (1 + .y/100), .init = 100))
    #  asset return newcol
    #1     A      5    100
    #2     A     10    110
    #3     A     20    132
    

    Or using cumprod

    df %>% 
        mutate(newcol = cumprod( c(100, 1 + return[-1]/100)))
    

    Or similar option in base R

    Reduce(function(x, y) x * (1 + y/100), df$return[-1], init = 100, accumulate = TRUE)
    #[1] 100 110 132
    

    Or with a for loop

    df$newcol[1] <- 100
    for(i in 2:nrow(df)) df$newcol[i] <- df$newcol[i-1] * (1 + df$return[i]/100)
    

    For multiple columns, use mutate_at

    df1 %>% 
        mutate_at(vars(starts_with('return')), 
          list(newcol = ~  accumulate(.[-1], ~ .x * (1+ .y/100), .init = 100)))
    #   asset return return2 return_newcol return2_newcol
    #1     A      5      15           100            100
    #2     A     10      12           110            112
    #3     A     20      25           132            140
    

    data

    df1 <- data.frame(asset = c("A", "A", "A"), 
          return = c(5,10,20), return2 = c(15, 12, 25))