Search code examples
rdplyraccumulate

How to use dplyr mutate function in R to calculate a running balance?


In the MWE code at the bottom, I'm trying to generate a running balance for each unique id when running from one row to the next. For example, when running the below code the output should be:

data2 <- 

 id plusA plusB minusC running_balance  [desired calculation for running balance]
  1     3     5    10              -2   3 + 5 - 10 = -2
  2     4     5     9               0   4 + 5 - 9 = 0
  3     8     5     8               5   8 + 5 - 8 = 5
  3     1     4     7               3   id doesn't change so 5 from above + (1 + 4 - 7) = 3
  3     2     5     6               4   id doesn't change so 3 from above + (2 + 5 - 6) = 4
  5     3     6     5               4   3 + 6 - 5 = 4

The below MWE refers to, when id is consistent from one row to the next, the prior row plusA amount rather than the prior row running_balance amount. I've tried changing the below to some form of lag(running_balance...) without luck yet.

I'm trying to minimize the use of too many packages. For example I understand the purrr package offers an accumulate() function, but I'd rather stick to only dplyr for now. Is there a simple way to do this, using dplyr mutate() in my case? I also tried fiddling around with the dplyr cumsum() function which should work here but I'm unsure of how to string several of them together.

MWE code:

data <- data.frame(id=c(1,2,3,3,3,5), 
                   plusA=c(3,4,8,1,2,3), 
                   plusB=c(5,5,5,4,5,6),
                   minusC = c(10,9,8,7,6,5))

library(dplyr)
data2<- subset(
  data %>% mutate(extra=case_when(id==lag(id) ~ lag(plusA), TRUE ~ 0)) %>%
    mutate(running_balance=plusA+plusB-minusC+extra),
  select = -c(extra)
)  

Solution

  • Using dplyr:

    data %>% 
      mutate(running_balance = plusA + plusB - minusC) %>% 
      group_by(id) %>% 
      mutate(running_balance = cumsum(running_balance)) %>% 
      ungroup()
    

    Output:

    # A tibble: 6 x 5
    # Groups:   id [4]
         id plusA plusB minusC running_balance
      <dbl> <dbl> <dbl>  <dbl>           <dbl>
    1     1     3     5     10              -2
    2     2     4     5      9               0
    3     3     8     5      8               5
    4     3     1     4      7               3
    5     3     2     5      6               4
    6     5     3     6      5               4