I have order and trades data and need to calculate the remaining volume in an order after trades have happened on it. On the data below, we have volumes against orders, and trades happen when status==2.
Order Status Volume Traded RemainingVolume
1: 412 1 100 NA 100
2: 412 2 NA 46 NA
3: 412 2 NA 15 NA
4: 412 2 NA 39 NA
5: 538 1 10 NA 10
6: 538 2 NA 7 NA
7: 538 2 NA 3 NA
8: 592 1 389 NA 389
9: 592 2 NA 95 NA
10: 648 1 100 NA 100
11: 648 2 NA 100 NA
12: 885 1 50 NA 50
13: 885 2 NA 47 NA
14: 885 2 NA 3 NA
15: 950 1 39 NA 39
16: 950 2 NA 39 NA
I just need to calculate the remaining volume for every order. This is calculated by subtracting the Traded (volume) from the Volume. Since the RemainingVolume at the beginning (status==1) is anyway set to Volume, this is simply (RemainingVolume-Traded), that s all. In other words, I need the following output:
Order Status Volume Traded RemainingVolume
1: 412 1 100 NA 100
2: 412 2 NA 46 54
3: 412 2 NA 15 39
4: 412 2 NA 39 0
5: 538 1 10 NA 10
6: 538 2 NA 7 3
7: 538 2 NA 3 0
8: 592 1 389 NA 389
9: 592 2 NA 95 294
10: 648 1 100 NA 100
11: 648 2 NA 100 0
12: 885 1 50 NA 50
13: 885 2 NA 47 3
14: 885 2 NA 3 0
15: 950 1 39 NA 39
16: 950 2 NA 39 0
Notice how remaining volume has been filled for orders 412, 538, 592, 648, 885 and 950. Assuming the data is in a table mz, rem is RemainingVolume, trdq is Traded, I tried the following:
for (i in 2:nrow(mz)){
if (is.na(mz[i]$rem))mz[i]$rem = mz[i-1]$rem - mz[i]$trdq
}
Works, but is really, really slow. The data here has many millions of rows, hence only data.table solution is feasible. So I tried:
mz[,rem:= ifelse(is.na(rem), shift(rem, 1)-trdq, rem), by = ord]
No answer here as well. I get the RemainingVolume for the first trade, but that's it. The trades below remain NA. What am I missing here? I'm new to data.table, and so something very simple, may be.
Performance is truly key here, given the size of the data. Any help is much appreciated.
baseR way of doing it
dt$RemainingVolume <- ave(replace(dt$RemainingVolume, dt$Status ==2, -1*dt$Traded[dt$Status ==2]), dt$Order, FUN = function(x) cumsum(x))
dt
Order Status Volume Traded RemainingVolume
1: 412 1 100 NA 100
2: 412 2 NA 46 54
3: 412 2 NA 15 39
4: 412 2 NA 39 0
5: 538 1 10 NA 10
6: 538 2 NA 7 3
7: 538 2 NA 3 0
8: 592 1 389 NA 389
9: 592 2 NA 95 294
10: 648 1 100 NA 100
11: 648 2 NA 100 0
12: 885 1 50 NA 50
13: 885 2 NA 47 3
14: 885 2 NA 3 0
15: 950 1 39 NA 39
16: 950 2 NA 39 0
sample data used
dt <- structure(list(Order = c(412L, 412L, 412L, 412L, 538L, 538L,
538L, 592L, 592L, 648L, 648L, 885L, 885L, 885L, 950L, 950L),
Status = c(1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 2L, 1L, 2L), Volume = c(100L, NA, NA, NA, 10L, NA, NA,
389L, NA, 100L, NA, 50L, NA, NA, 39L, NA), Traded = c(NA,
46L, 15L, 39L, NA, 7L, 3L, NA, 95L, NA, 100L, NA, 47L, 3L,
NA, 39L), RemainingVolume = c(100, 54, 39, 0, 10, 3, 0, 389,
294, 100, 0, 50, 3, 0, 39, 0)), row.names = c(NA, -16L), class = "data.frame")