I have a question related to a financial transactions dataset. I have two datasets:
The first one containing financial transactions with timestamp.
Account_from Account_to Value Timestamp
1 1 2 25 1
2 1 3 25 1
3 2 1 50 2
4 2 3 20 2
5 2 4 25 2
6 1 2 40 3
7 3 1 20 3
8 2 4 25 3
The other dataset contains account information:
Account_id initial deposit
1 1 200
2 2 100
3 3 150
4 4 200
Now I would like to create a dataset, with financial transactions and the balance of the original account. Furthermore, I would like that the balance of the account changes of time with each transactions made, such that:
Account_from Account_to Value Timestamp Initial_deposit Old_bal_org New_bal_org Old_bal_des New_bal_des
1 1 2 25 1 200 200 175 100 125
2 1 3 25 1 200 175 150 150 175
3 2 1 50 2 100 125 75 150 200
4 2 3 20 2 100 75 55 175 195
5 2 4 25 2 100 55 30 200 225
6 1 2 40 3 200 200 160 30 70
7 3 1 20 3 150 195 175 160 180
8 2 4 25 3 100 70 45 225 250
How would this be possible?
to reproduce data:
dftrans <- structure(list(Account_from = c(1L, 1L, 2L, 2L, 2L, 1L, 3L, 2L
), Account_to = c(2L, 3L, 1L, 3L, 4L, 2L, 1L, 4L), Value = c(25,
25, 50, 20, 25, 40, 20, 25), Timestamp = c(1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L)), class = "data.frame", row.names = c(NA, -8L))
dfacc <- structure(list(Account_id = c(1L, 2L, 3L, 4L), Initial__deposit = c(200, 100, 150, 200)), class = "data.frame", row.names = c(NA, -4L))
Thanks in advance
One possible way to do this:
dftransFinal <- dftrans %>%
# create a record id to keep track of each transaction
rowid_to_column(var = 'recordID') %>%
pivot_longer(cols = c(Account_to, Account_from), names_to = 'accountType',
values_to = 'Account_id') %>%
left_join(dfacc, by = 'Account_id') %>%
# If the record is a 'from' set value to negative so that it is subtracted from balance
mutate(Value = if_else(accountType == 'Account_from', -Value, Value)) %>%
group_by(Account_id) %>%
mutate(sum_changes = cumsum(Value),
# calculate the cumulative sum with a lag for old_bal
sum_changes_lag = lag(cumsum(Value), k = 1, default = 0),
Old_bal_org = Initial__deposit + sum_changes_lag,
New_bal_org = Initial__deposit + sum_changes) %>%
pivot_wider(names_from = 'accountType', values_from = c('Account_id', 'Old_bal_org',
'Initial__deposit',
'New_bal_org', 'Value'),
id_cols = c('recordID', 'Timestamp')) %>%
# select, rename, and order columns
select('Account_from' = 'Account_id_Account_from', 'Account_to' = 'Account_id_Account_to',
'Value' = 'Value_Account_to', Timestamp, 'Initial__deposit' = 'Initial__deposit_Account_from',
'Old_bal_org' = 'Old_bal_org_Account_from', 'New_bal_org' = 'New_bal_org_Account_from',
'Old_bal_des' = 'Old_bal_org_Account_to', 'New_bal_des' = 'New_bal_org_Account_to')