I have a main dataframe named raw which looks like this:
tab:([]date:2018.02.05 2018.02.05 2018.02.06 2018.02.06;time:01:30:25.000 02:30:45.000 04:15:15.000 02:15:15.000;vol:50 55 64 12; name:`A`B`B`A)
date time vol name
2018.02.05 1:30:25 50 A
2018.02.05 2:30:45 55 B
2018.02.06 4:15:15 64 B
2018.02.06 2:15:15 12 A
I need to create a new table dependent on conditions like:
Between two certain dates I need to find times when cumulative vol is 100 for name B within a two hour period.
The logic which I think should work: arrange the data by ascending order of time. Add all the vol by name= `B for time within (time[i]: time[i]+2hrs). If the cum vol > 100, return the time intervals and corresponding date. Continue with i +1 . I'm new to kdb so I'm facing difficulties implementing it.
Sample output:
time1 time2 date1 date2
1:30:00 3:30:00 2018.02.05 2018.02.05
23:00:00 1:00:00 2018.02.05 2018.02.06
Any leads on this is appreciated. Thanks
I believe a solution to your problem can be reached using aj
Initially as you pointed out, the table should be sorted by time
`time xasc `tab;
Then, a cumulative sum of the volumes should be created, using sums
tab:update cumvol:sums vol by name from tab
Then using aj - to get the cumulative sums of the volumes that are not within the 2 hour period for each time.
aj[`name`time;tab;select time:time+02:00,name,cumvol2:cumvol from tab]
We can then do cumvol - cumvol2 to get the total volume in each 2 hour period
tab:select time, name, runningvol:cumvol-0^cumvol2 from
aj[`name`time;tab;select time:time+02:00,name,cumvol2:cumvol from tab]
Then a simple select statement can get the times where the cumvol is greater than 100
select time,time+02:00 from tab where runningvol>100
An improvement that could be added to this would be to add a grouped attribute on the 2nd table in the aj. Another improvement to this would be to format the dates and times into a single timestamp or datetime.
More info on the functions aj and sums can be found here: