I have the following transaction data spanning three months January-February:
tab.m <- structure(list(Date = structure(c(1580947200, 1581033600, 1581120000,
1581206400, 1581292800, 1581379200, 1581465600, 1581552000, 1581638400,
1583798400, 1583884800, 1583971200, 1584057600, 1584144000, 1584230400,
1584316800, 1584403200, 1587168000, 1587254400, 1587340800, 1587427200,
1587513600, 1587600000, 1587686400), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Month = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3,
3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4), `Product Type` = c("LIZX",
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX",
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX",
"LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX", "LIZX"), Account = c(931,
931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931, 931,
931, 931, 931, 931, 931, 931, 931, 931, 931, 931), Subsidiary = c(124,
124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124, 124,
124, 124, 124, 124, 124, 124, 124, 124, 124, 124), Description = c("Transaction",
"Transaction X", "Transaction", "Transaction", "Transaction X",
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction",
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction",
"Transaction", "Transaction", "Transaction", "Transaction", "Transaction",
"Transaction", "Transaction", "Transaction", "Transaction"),
`Policy Number` = c(42057926, 42057926, 42057926, 42057926,
42057926, 42057926, 42057926, 42057926, 42057926, 42060466,
42060466, 42060466, 42060466, 42060466, 42060466, 42060466,
42060466, 42060467, 42060467, 42060467, 42060467, 42060467,
42060467, 42060467), Amount = c(10, -10, 20, -20, 30, 24,
23, 22, -0.56, 1, -1, 2, -2, 2, 3, 4, -1, 3, -3, -3, -3,
-3, -3, -3)), row.names = c(NA, -24L), class = c("tbl_df",
"tbl", "data.frame"))
I have used the split() function to group the dataframe of transactions by month and policy number:
grouped = split(tab.m,list(tab.m$Month,tab.m$`Policy Number`))
Within each group, there are pairs of reversal transactions, where there is a positive/negative amount and another row before/after with the inverse of that exact same amount. I would like to remove these pairs of transactions from each group and thereafter merge the groups back into a dataframe. It may be that the positive transaction comes first and then a negative one, or vice versa.
Please note that the pairs of reversal transactions are not always adjacent.
Here's an approach with dplyr
:
library(dplyr)
tab.m %>%
group_by(Month,`Policy Number`) %>%
mutate(id = rep(seq_along(rle(abs(Amount))$lengths), #Create a temporary grouping id using run length encoding
times = rle(abs(Amount))$lengths)) %>% #such that sets of Amounts with the same absolute value are together
group_by(Month,`Policy Number`,id) %>% #Group by this new temporary id
mutate(temp = min(table(factor(sign(Amount), #Create a new temporary value that calculates the number
levels = c(-1,1))))) %>% #of positives and negatives, the minimum value of which can be removed
group_by(Month,`Policy Number`, id, Amount) %>% #Group by id and Amount
dplyr::filter(n() > temp) %>% #Filter out values less than the number to remove
dplyr::select(-c(id,temp)) #Remove temporary columns
id Date Month `Product Type` Account Subsidiary Description `Policy Number` Amount
<int> <dttm> <dbl> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 3 2020-02-10 00:00:00 2 LIZX 931 124 Transaction X 42057926 30
2 4 2020-02-11 00:00:00 2 LIZX 931 124 Transaction 42057926 24
3 5 2020-02-12 00:00:00 2 LIZX 931 124 Transaction 42057926 23
4 6 2020-02-13 00:00:00 2 LIZX 931 124 Transaction 42057926 22
5 7 2020-02-14 00:00:00 2 LIZX 931 124 Transaction 42057926 -0.56
6 2 2020-03-12 00:00:00 3 LIZX 931 124 Transaction 42060466 2
7 2 2020-03-14 00:00:00 3 LIZX 931 124 Transaction 42060466 2
8 3 2020-03-15 00:00:00 3 LIZX 931 124 Transaction 42060466 3
9 4 2020-03-16 00:00:00 3 LIZX 931 124 Transaction 42060466 4
10 5 2020-03-17 00:00:00 3 LIZX 931 124 Transaction 42060466 -1
11 1 2020-04-19 00:00:00 4 LIZX 931 124 Transaction 42060467 -3
12 1 2020-04-20 00:00:00 4 LIZX 931 124 Transaction 42060467 -3
13 1 2020-04-21 00:00:00 4 LIZX 931 124 Transaction 42060467 -3
14 1 2020-04-22 00:00:00 4 LIZX 931 124 Transaction 42060467 -3
15 1 2020-04-23 00:00:00 4 LIZX 931 124 Transaction 42060467 -3
16 1 2020-04-24 00:00:00 4 LIZX 931 124 Transaction 42060467 -3