Search code examples
performancekdb

Alternative to using ungroup in kdb?


I have two tables in KDB.

One is a timeseries with a datetime, sym column (spanning multiple dates, eg could be 1mm rows or 2mm rows). Each timepoint has the same number of syms and few other standard columns such as price. Let's call this t1:

`date`datetime`sym`price

The other table is of this structure:

`date`sym`factors`weights 

where factors is a list and weights is a list of equal length for each sym. Let's call this t2.

I'm doing a left join on these two tables and then an ungroup. factors and weights are of not equal length for each sym.

I'm doing the following:

select sum (weights*price) by date, factors from ungroup t1 lj `date`sym xkey t2 

However this is very slow and can be as slow as 5-6 seconds if t1 has a million rows or more.

Calling all kdb experts for some advice!

EDIT:

here's a full example: (apologies for the roundabout way of defining t1 and t2)

interval: `long$`time$00:01:00; 
hops: til 1+ `int$((`long$(et:`time$17:00)-st:`time$07:00))%interval;
times: st + `long$interval*hops; 
dates: .z.D - til .z.D-.z.D-10; 
timepoints: ([] date: dates) cross ([] time:times); 
syms: ([] sym: 300?`5); 
universe: timepoints cross syms; 
t1: update datetime: date+time, price:count[universe]?100.0 from universe;
t2: ([] date:dates) cross syms; 
/ note here my real life t2, doesn't have a count of 10 weights/factors for each sym, it can vary by sym. 
t2: `date`sym xkey update factors: count[t2]#enlist 10?`5, weights: count[t2]#enlist 10?10 from t2; 

/ what is slow is the ungroup 
select sum weights*price by date, datetime, factors from ungroup t1 lj t2

Solution

  • One approach to avoid the ungroup is to work with matrices (aka lists of lists) and take advantage of the optimised matrix-multiply $ seen here: https://code.kx.com/q/ref/mmu/

    In my approach below, instead of joining t2 to t1 to ungroup, I group t1 and join to t2 (thus keeping everything as lists of lists) and then use some matrix manipulation (with a final ungroup at the end on a much smaller set)

    q)\ts res:select sum weights*price by date, factors from ungroup t1 lj t2
    4100 3035628112
    q)\ts resT:ungroup exec first factors,sum each flip["f"$weights]$price by date:date from t2 lj select price by date,sym from t1;
    76 83892800
    
    q)(0!res)~`date`factors xasc `date`factors`weights xcol resT
    1b
    

    As you can see its much quicker (at least on my machine) and the result is identical save for ordering and column names.

    You may still need to modify this solution somewhat to work in your actual use-case (with variable weights etc - in this case perhaps enforce a uniform number of weights across each sym filling with zeros if necessary)