Search code examples
joinkdb

KDB+ Determine whether a price has persisted for a time (by RIC)


I have a table like the below:

test:([] time:09:30:00.500000000 09:30:00.500500000 09:30:00.600000000 09:30:00.610000000 09:30:00.610100000 09:30:00.620500000 09:30:00.621000000 09:30:00.630000000; RIC:`AAPL`AAPL`AAPL`IBM`AAPL`AAPL`AAPL`IBM; price:1 2 2 11 2 2 3 12; moreThan1MS:`N`Y`Y`Y`Y`N`NA`NA)

Though my table does not have the moreThan1MS column. The intention of that column is to determine whether a price was persistent for longer than 1 millisecond. So for the first entry, the price of AAPL changed .5 ms later, so its false. For the first entry of IBM, the price of IBM changed 20 ms later, so its true, although the price of AAPL changed .1 ms afterwards.

I find that edge case (sorting by RIC) to be the hard part; what I've tried so far was calculating the abs delta of price by RIC, then doing an as-of join on the table shifted by 1 ms, and see if the price change of the shifted table doesn't match the unshifted table. This is wonky, and I don't think it works the way I think it should, but most notably it doesn't discern by RIC, so it considers the first IBM entry to be short (since AAPL price change happens right afterwards).

Is there a streamlined way to calculate a column like this?

(This is what I tried:)

test:update X:(abs deltas price) by RIC from test;
test:aj[`time;test;select time-(1000000), Y:X from test];
test:update isLong:(Y=0)|(X=Y) from test;

UPDATE:

Following Jason's answer, I tried the below but found an edge case:

t:([] time:09:30:00.500000000 09:30:00.501000000 09:30:00.502000000 09:30:00.503000000 09:30:00.540000000 09:30:00.610000000 09:30:00.610100000 09:30:00.690000000 09:30:00.700000000 09:30:00.730000000; RIC:`AAPL`AAPL`AAPL`AAPL`AAPL`IBM`AAPL`AAPL`AAPL`IBM; price:1 1 2 2 1 11 2 2 3 12; moreThan50MS:`N`N`N`N`Y`Y`Y`N`NA`NA)
update moreThan50MSsignal:`N`Y price=price([]RIC;time-50000000)bin([]RIC;time)from`t;

time                    RIC     price   moreThan50MS   moreThan50MSsignal
0D09:30:00.500000000    AAPL    1       N              Y
0D09:30:00.501000000    AAPL    1       N              Y
0D09:30:00.502000000    AAPL    2       N              N
0D09:30:00.503000000    AAPL    2       N              N
0D09:30:00.540000000    AAPL    1       Y              Y
0D09:30:00.610000000    IBM     11      Y              Y
0D09:30:00.610100000    AAPL    2       Y              Y
0D09:30:00.690000000    AAPL    2       N              N
0D09:30:00.700000000    AAPL    3       NA             Y
0D09:30:00.730000000    IBM     12      NA             Y

The edge case is shown in the first and second entry.

The current implementation looks at the price 50ms ahead, but fails to consider price changes that revert back to the original price within the timespan. So, in the first and second entries, the current code looks at the 5th entry (latest entry as of 50ms ahead) and sees the price is the same as now, so marks it as True. It needs to notice that there was a price change between now and then, but I'm not sure how to implement such a change.

I was thinking about using whether or not the cumsum of the abs(deltas) is non-zero, but I'm not sure how to apply that in a window in an as of join. Another idea I had was to force msums to sum based on a time window instead of rows, but again struggling to implement.


Solution

  • Using window join wj1 you can aggregate values within a window, which could work for your use case which requires checking if values change within the timespan.

    The documentation for wj1 states that the table to be joined

    "should be sorted `sym`time with `p# on sym".

    The table t0 can be created to meet these requirements, noting that price has been renamed so as not to conflict the the price column in t:

    t0:update `p#RIC from `RIC`time xasc select time,RIC,mt50ms:price from t;
    time                 RIC  mt50ms
    --------------------------------
    0D09:30:00.500000000 AAPL 1
    0D09:30:00.501000000 AAPL 1
    ...
    0D09:30:00.700000000 AAPL 3
    0D09:30:00.610000000 IBM  11
    0D09:30:00.730000000 IBM  12
    

    Performing a window join using the identity function :: shows the values collected in each interval:

    period:00:00:00.05
    
    w:(0;period)+\:exec time from t
    
    wj1[w;`RIC`time;t;(t0;(::;`mt50ms))]
    time                 RIC  price moreThan50MS moreThan50MSsignal mt50ms
    -------------------------------------------------------------------------
    0D09:30:00.500000000 AAPL 1     N            Y                  1 1 2 2 1
    0D09:30:00.501000000 AAPL 1     N            Y                  1 2 2 1
    0D09:30:00.502000000 AAPL 2     N            N                  2 2 1
    0D09:30:00.503000000 AAPL 2     N            N                  2 1
    0D09:30:00.540000000 AAPL 1     Y            Y                  ,1
    0D09:30:00.610000000 IBM  11    N            Y                  ,11
    0D09:30:00.610100000 AAPL 2     Y            Y                  ,2
    0D09:30:00.690000000 AAPL 2     N            N                  2 3
    0D09:30:00.700000000 AAPL 3     NA           Y                  ,3
    0D09:30:00.730000000 IBM  12    NA           Y                  ,12
    

    This can be replaced with a custom function to assess all the values in the interval. In this case I've just checked there is no more than 1 distinct value, which indicates no change over the interval.

    res:wj1[w;`RIC`time;t;(t0;({`N`Y 1=count distinct x};`mt50ms))]
    time                 RIC  price moreThan50MS moreThan50MSsignal mt50ms
    ----------------------------------------------------------------------
    0D09:30:00.500000000 AAPL 1     N            Y                  N
    0D09:30:00.501000000 AAPL 1     N            Y                  N
    0D09:30:00.502000000 AAPL 2     N            N                  N
    0D09:30:00.503000000 AAPL 2     N            N                  N
    0D09:30:00.540000000 AAPL 1     Y            Y                  Y
    0D09:30:00.610000000 IBM  11    N            Y                  Y
    0D09:30:00.610100000 AAPL 2     Y            Y                  Y
    0D09:30:00.690000000 AAPL 2     N            N                  N
    0D09:30:00.700000000 AAPL 3     NA           Y                  Y
    0D09:30:00.730000000 IBM  12    NA           Y                  Y
    

    You can then add NA if necessary. I've assumed it applies to intervals that extend beyond the last time in the table.

    res:update mt50ms:`NA from res where time>last[time]-period,i=(last;i)fby RIC
    time                 RIC  price moreThan50MS moreThan50MSsignal mt50ms
    ----------------------------------------------------------------------
    0D09:30:00.500000000 AAPL 1     N            Y                  N
    0D09:30:00.501000000 AAPL 1     N            Y                  N
    0D09:30:00.502000000 AAPL 2     N            N                  N
    0D09:30:00.503000000 AAPL 2     N            N                  N
    0D09:30:00.540000000 AAPL 1     Y            Y                  Y
    0D09:30:00.610000000 IBM  11    N            Y                  Y
    0D09:30:00.610100000 AAPL 2     Y            Y                  Y
    0D09:30:00.690000000 AAPL 2     N            N                  N
    0D09:30:00.700000000 AAPL 3     NA           Y                  NA
    0D09:30:00.730000000 IBM  12    NA           Y                  NA