Search code examples
kdb

What is the right way to iterate through a kdb partitioned table in an client application?


I want to process all the rows of a kdb table in an R program (I use qserver.R). One way to do this is to initialize a memory handler and then iterate through all the rows one of the time, as explained here:

t: select from mytable where ts>12:30:00,ts<15:00:00,price,msg="A"
t[0]
t[1]
t[2]
...

I want to limit the number of client/server calls in R to loop as fast as possible. How can I fetch multiple rows for each call?


Solution

  • NOTE: my answer below assumes that mytable is the partioned database, but that you now have t in memory.

    another option using cut (using "chunks" of 1,000,000 as per your earlier post)

      (`int$1e6) cut t
    

    now you have a list of table "chunks" of your desired size and you can use accordingly.

    I frequently use this for certain functions (particularly in combination with peach).

    A pattern I've found useful is:

     f:`function that does something useful on chunks`
     fa:`function that reaggregates up to final results`
     r:fa raze f peach (`int$`size`)cut t
    

    if you're t is really large (both vertical/horizontal) you might want to avoid cut directly on the table for memory reasons, but can instead cut a list of indices for the table into the appropriate size and then feed the indices to your f and have that index to the t and grab what you want.

    Below a quick comparison of both approaches (note that f here is pointless, but just to prove the point of the cut on t versus indices)

       q)t:flip (`$"c",/:string til 100)!{(`int$1e7)?100} each til 100
       q)\ts a:raze {select c1,c99 from x}each 1000 cut t
       3827 4108103072j
       q)\ts b:raze {select c1,c99 from t[x]}each 1000 cut til count t
       3057 217623200j
       q)4108103072j%217623200j
       18.87714
       q)a~b
       1b