Search code examples
c#kdbq-lang

KDB simple keyed table performance


I have the following table, about 3m rows. I want to know which approach/attributes I can use the squeeze maximum performance out of it. I will be selecting only on exact matches for inst + time.

price_hist_5min_bars:([inst:`$();time:`timestamp$()]price:`float$())

Also, does the approach change if the table is for reporting only or is changing? This table won't have any rows added to it on-the-fly but I will have another intra-day table with the same structure which will have rows added every 1 or 5 mins. All suggestions appreciated!


Solution

  • I have tested with a sample database and I can get a slight improvement if I the use `p# attribute:

    update `p#inst from `inst`time xasc price_hist_5min_bars
    

    In this case `p# has advantages over `g# as the symbols are in order and not dispersed. See the wiki entry on attributes for more information.

    Better performance can be achieved if you restructure the table. Using the following table as an example:

    q)n:10000000
    q)t:2!update `p#sym from `sym`time xasc ([]sym:n?-100?`3;time:.z.d+n?1D;price:n?1000f)
    

    We can set a benchmark using select:

    q)\ts:1000 select from t where sym=`bak,time=2017.11.29D23:59:59.520923942
    210 1180496
    

    You can restructure by grouping all of the time and price values for each sym, sorting on time:

    q)show r:select `s#time,price by sym from t
    sym| time                                                                                        ..
    ---| --------------------------------------------------------------------------------------------..
    aan| `s#2017.11.29D00:00:00.131421536 2017.11.29D00:00:00.214382261 2017.11.29D00:00:00.914720445..
    ...
    

    Working with this restructured data, we can return the price with:

    q)r[`bak][`price] r[`bak][`time]?2017.11.29D23:59:59.520923942
    948.3733
    q)\ts:1000 r[`bak][`price] r[`bak][`time]?2017.11.29D23:59:59.520923942
    4 1824
    

    Where the index of the record within the sym grouping is given by:

    q)r[`bak][`time]?2017.11.29D23:59:59.520923942
    100638
    

    Obviously the above example only returns and atom and not a table. If you wished to have a table you could try something like this:

    q)flip (),/:@[;`sym;:;`bak]flip[r`bak]r[`bak][`time]?2017.11.29D23:59:59.520923942
    time                          price    sym
    ------------------------------------------
    2017.11.29D23:59:59.520923942 948.3733 bak
    q)\ts:1000 flip (),/:@[;`sym;:;`bak]flip[r`bak]r[`bak][`time]?2017.11.29D23:59:59.520923942
    7 2688
    

    But it depends on how you want your output to look.