Search code examples
kdbq-query

How to sample from KDB table to reduce data before querying?


I have a table of tick data representing prices of various financial instruments up to millisecond precision. Problem is, there are over 5 billion entries, and even the most basic queries takes several minutes.

I only need data with a precision of up to 1 second - is there an efficient way to sample the table so that the precision is reduced to roughly 1 second prior to querying? This should dramatically cut the amount of data and hence execution time.

So far, as a quick hack I've added the condition where i mod 2 = 0 to my query, but is there a better way?


Solution

  • The best way to bucket time data is with xbar

    q)select last price, sum size by 10 xbar time.minute from trade where sym=`IBM
    minute| price size
    ------| -----------
    09:30 | 55.32 90094
    09:40 | 54.99 48726
    09:50 | 54.93 36511
    10:00 | 55.23 35768 
    ...
    

    more info http://code.kx.com/q/ref/arith-integer/#xbar