Search code examples
rdplyrdatasetmutate

How to add a column to a dataset which adds values from one column and subtracts values from another column in R?


Here is a small example of the kind of data I have:

transactions <- tibble(id = seq(1:7),
                       day = paste(rep("day", each = 7), seq(1:7), sep = ""),
                       sent_to = c(NA, "Garden Cinema", "Pasta House", NA, "Blue Superstore", "Jane", "Joe"),
                       received_from = c("ATM", NA, NA, "Sarah", NA, NA, NA),
                       reference = c("add_cash", "cinema_tickets", "meal", "gift", "shopping", "reimbursed", "reimbursed"),
                       decrease = c(NA, 10.8, 12.5, NA, 15.25, NA, NA),
                       increase = c(50, NA, NA, 30, NA, 5.40, 7.25))

# # A tibble: 7 × 7
#      id   day   sent_to         received_from reference      decrease  increase
#    <int>  <chr> <chr>           <chr>         <chr>          <dbl>     <dbl>   
# 1     1   day1  NA              ATM           add_cash       NA        50      
# 2     2   day2  Garden Cinema   NA            cinema_tickets 10.8      NA      
# 3     3   day3  Pasta House     NA            meal           12.5      NA      
# 4     4   day4  NA              Sarah         gift           NA        30      
# 5     5   day5  Blue Superstore NA            shopping       15.2      NA      
# 6     6   day6  Jane            NA            reimbursed     NA        5.4     
# 7     7   day7  Joe             NA            reimbursed     NA        7.25    

I would like to add a "balance" column to this dataset where:

  • Row 1: starts with 50
  • Row 2: has previous balance amount + increase - decrease
  • Row 3, etc.: same as row 2 formula

I've been struggling to do this myself as I don't know if there are any existing functions which help with this types of data manipulation. The only function that comes to mind is the dplyr::lag() but I'm not sure how to use it.

Any help is appreciated :)


Solution

  • You could first create a column of the change and second use purrr::accumulate to create your balance column:

    library(dplyr, warn = FALSE)
    library(purrr)
    
    transactions |> 
      mutate(change = coalesce(increase, -decrease),
             balance = accumulate(change, ~ .x + .y))
    #> # A tibble: 7 × 9
    #>      id day   sent_to         received_…¹ refer…² decre…³ incre…⁴ change balance
    #>   <int> <chr> <chr>           <chr>       <chr>     <dbl>   <dbl>  <dbl>   <dbl>
    #> 1     1 day1  <NA>            ATM         add_ca…    NA     50     50       50  
    #> 2     2 day2  Garden Cinema   <NA>        cinema…    10.8   NA    -10.8     39.2
    #> 3     3 day3  Pasta House     <NA>        meal       12.5   NA    -12.5     26.7
    #> 4     4 day4  <NA>            Sarah       gift       NA     30     30       56.7
    #> 5     5 day5  Blue Superstore <NA>        shoppi…    15.2   NA    -15.2     41.4
    #> 6     6 day6  Jane            <NA>        reimbu…    NA      5.4    5.4     46.8
    #> 7     7 day7  Joe             <NA>        reimbu…    NA      7.25   7.25    54.1
    #> # … with abbreviated variable names ¹​received_from, ²​reference, ³​decrease,
    #> #   ⁴​increase