Search code examples
kdb

kdb+/q: Apply iterative procedure with updated variable to a column


Consider the following procedure f:{[x] ..} with starting value a:0:

  1. Do something with x and a. The output is saved as the new version of a, and the output is returned by the function
  2. For the next input x, redo the procedure but now with the new a.

For a single value x, this procedure is easily constructed. For example:

a:0;
f:{[x] a::a+x; :a}       / A simple example (actual function more complicated)

However, how do I make such a function such that it also works when applied on a table column?

I am clueless how to incorporate this step for 'intermediate saving of a variable' in a function that can be applied on a column at once. Is there a special technique for this? E.g. when I use a table column in the example above, it will simply calculate a+x with a:0 for all rows, opposed to also updating a at each iteration.


Solution

  • No need to use global vars for this - can use scan instead - see here.

    Example --

    Generate a table -

    q)t:0N!([] time:5?.z.p; sym:5?`3; price:5?100f; size:5?10000)
        time                          sym price    size
        -----------------------------------------------
        2002.04.04D18:06:07.889113280 cmj 29.07093 3994
        2007.05.21D04:26:13.021438816 llm 7.347808 496
        2010.10.30D10:15:14.157553088 obp 31.59526 1728
        2005.11.01D21:15:54.022395584 dhc 34.10485 5486
        2005.03.06D21:05:07.403334368 mho 86.17972 2318
    

    Example with a simple accumilator - note, the function has access to the other args if needed (see next example):

    q)update someCol:{[a;x;y;z] (a+1)}\[0;time;price;size] from t
        time                          sym price    size someCol
        -------------------------------------------------------
        2002.04.04D18:06:07.889113280 cmj 29.07093 3994 1
        2007.05.21D04:26:13.021438816 llm 7.347808 496  2
        2010.10.30D10:15:14.157553088 obp 31.59526 1728 3
        2005.11.01D21:15:54.022395584 dhc 34.10485 5486 4
        2005.03.06D21:05:07.403334368 mho 86.17972 2318 5
    

    Say you wanted to get cumilative size:

    q)update cuSize:{[a;x;y;z] (a+z)}\[0;time;price;size] from t
        time                          sym price    size cuSize
        ------------------------------------------------------
        2002.04.04D18:06:07.889113280 cmj 29.07093 3994 3994
        2007.05.21D04:26:13.021438816 llm 7.347808 496  4490
        2010.10.30D10:15:14.157553088 obp 31.59526 1728 6218
        2005.11.01D21:15:54.022395584 dhc 34.10485 5486 11704
        2005.03.06D21:05:07.403334368 mho 86.17972 2318 14022
    

    If you wanted more than one var passed through the scan, can pack more values into the first var, by giving it a more complex structure:

    q)update cuPriceAndSize:{[a;x;y;z] (a[0]+y;a[1]+z)}\[0 0;time;price;size] from t
        time                          sym price    size cuPriceAndSize
        --------------------------------------------------------------
        2002.04.04D18:06:07.889113280 cmj 29.07093 3994 29.07093 3994
        2007.05.21D04:26:13.021438816 llm 7.347808 496  36.41874 4490
        2010.10.30D10:15:14.157553088 obp 31.59526 1728 68.014   6218
        2005.11.01D21:15:54.022395584 dhc 34.10485 5486 102.1188 11704
        2005.03.06D21:05:07.403334368 mho 86.17972 2318 188.2986 14022