Search code examples
rdplyrlag

R Lag value of previous calculated function


I am trying to use a lag value of a previous row, which needs to be calculated from the previous row (unless its first entry).

I was trying something similar to:

test<-data.frame(account_id=c(123,123,123,123,444,444,444,444),entry=c(1,2,3,4,1,2,3,4),beginning_balance=c(100,0,0,0,200,0,0,0),
                 deposit=c(10,20,5,8,10,12,20,4),running_balance=c(0,0,0,0,0,0,0,0))

test2<-test %>%
  group_by(account_id) %>%
  mutate(running_balance = if_else(entry==1, beginning_balance+deposit,
                                   lag(running_balance)+deposit))

print(test2)

the running balance should be 110,130,135,143,210,222,242,246


Solution

  • For each account_id you can add first beginning_balance with cumulative sum of deposit.

    library(dplyr)
    
    test %>%
      group_by(account_id) %>%
      mutate(running_balance = first(beginning_balance) + cumsum(deposit))
    
    
    #  account_id entry beginning_balance deposit running_balance
    #       <dbl> <dbl>             <dbl>   <dbl>           <dbl>
    #1        123     1               100      10             110
    #2        123     2                 0      20             130
    #3        123     3                 0       5             135
    #4        123     4                 0       8             143
    #5        444     1               200      10             210
    #6        444     2                 0      12             222
    #7        444     3                 0      20             242
    #8        444     4                 0       4             246
    

    Same thing using data.table :

    library(data.table)
    setDT(test)[, running_balance := first(beginning_balance) + cumsum(deposit), account_id]