Here is a small example of the kind of data I have:
transactions <- tibble(id = seq(1:7),
day = paste(rep("day", each = 7), seq(1:7), sep = ""),
sent_to = c(NA, "Garden Cinema", "Pasta House", NA, "Blue Superstore", "Jane", "Joe"),
received_from = c("ATM", NA, NA, "Sarah", NA, NA, NA),
reference = c("add_cash", "cinema_tickets", "meal", "gift", "shopping", "reimbursed", "reimbursed"),
decrease = c(NA, 10.8, 12.5, NA, 15.25, NA, NA),
increase = c(50, NA, NA, 30, NA, 5.40, 7.25))
# # A tibble: 7 × 7
# id day sent_to received_from reference decrease increase
# <int> <chr> <chr> <chr> <chr> <dbl> <dbl>
# 1 1 day1 NA ATM add_cash NA 50
# 2 2 day2 Garden Cinema NA cinema_tickets 10.8 NA
# 3 3 day3 Pasta House NA meal 12.5 NA
# 4 4 day4 NA Sarah gift NA 30
# 5 5 day5 Blue Superstore NA shopping 15.2 NA
# 6 6 day6 Jane NA reimbursed NA 5.4
# 7 7 day7 Joe NA reimbursed NA 7.25
I would like to add a "balance" column to this dataset where:
I've been struggling to do this myself as I don't know if there are any existing functions which help with this types of data manipulation. The only function that comes to mind is the dplyr::lag()
but I'm not sure how to use it.
Any help is appreciated :)
You could first create a column of the change
and second use purrr::accumulate
to create your balance
column:
library(dplyr, warn = FALSE)
library(purrr)
transactions |>
mutate(change = coalesce(increase, -decrease),
balance = accumulate(change, ~ .x + .y))
#> # A tibble: 7 × 9
#> id day sent_to received_…¹ refer…² decre…³ incre…⁴ change balance
#> <int> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 day1 <NA> ATM add_ca… NA 50 50 50
#> 2 2 day2 Garden Cinema <NA> cinema… 10.8 NA -10.8 39.2
#> 3 3 day3 Pasta House <NA> meal 12.5 NA -12.5 26.7
#> 4 4 day4 <NA> Sarah gift NA 30 30 56.7
#> 5 5 day5 Blue Superstore <NA> shoppi… 15.2 NA -15.2 41.4
#> 6 6 day6 Jane <NA> reimbu… NA 5.4 5.4 46.8
#> 7 7 day7 Joe <NA> reimbu… NA 7.25 7.25 54.1
#> # … with abbreviated variable names ¹received_from, ²reference, ³decrease,
#> # ⁴increase