Search code examples
kdb

Grouping data in kdb with certain condition


I have a main dataframe named raw which looks like this:

tab:([]date:2018.02.05 2018.02.05 2018.02.06 2018.02.06;time:01:30:25.000 02:30:45.000 04:15:15.000 02:15:15.000;vol:50 55 64 12; name:`A`B`B`A)

date           time         vol     name   
2018.02.05     1:30:25      50       A
2018.02.05     2:30:45      55       B
2018.02.06     4:15:15      64       B
2018.02.06     2:15:15      12       A

I need to create a new table dependent on conditions like:

Between two certain dates I need to find times when cumulative vol is 100 for name B within a two hour period.

The logic which I think should work: arrange the data by ascending order of time. Add all the vol by name= `B for time within (time[i]: time[i]+2hrs). If the cum vol > 100, return the time intervals and corresponding date. Continue with i +1 . I'm new to kdb so I'm facing difficulties implementing it.

Sample output:

time1          time2         date1         date2
1:30:00        3:30:00       2018.02.05    2018.02.05
23:00:00       1:00:00       2018.02.05    2018.02.06

Any leads on this is appreciated. Thanks


Solution

  • I believe a solution to your problem can be reached using aj

    Initially as you pointed out, the table should be sorted by time

    `time xasc `tab;
    

    Then, a cumulative sum of the volumes should be created, using sums

    tab:update cumvol:sums vol by name from tab
    

    Then using aj - to get the cumulative sums of the volumes that are not within the 2 hour period for each time.

    aj[`name`time;tab;select time:time+02:00,name,cumvol2:cumvol from tab]
    

    We can then do cumvol - cumvol2 to get the total volume in each 2 hour period

    tab:select time, name, runningvol:cumvol-0^cumvol2 from 
    aj[`name`time;tab;select time:time+02:00,name,cumvol2:cumvol from tab]
    

    Then a simple select statement can get the times where the cumvol is greater than 100

    select time,time+02:00 from tab where runningvol>100
    

    An improvement that could be added to this would be to add a grouped attribute on the 2nd table in the aj. Another improvement to this would be to format the dates and times into a single timestamp or datetime.

    More info on the functions aj and sums can be found here:

    http://code.kx.com/q/ref/joins/#aj-aj0-asof-join

    http://code.kx.com/q/ref/arith-integer/#sums