Search code examples
listfunctionkdb+percentilesliding-window

KDB moving percentile using Swin function


I am trying to create a list of the 99th and 1st percentiles. Rather than a single percentile for today. I wanted percentiles for 500 days each using the prior 500 days. The functions I was using for this are the following

swin:{[f;w;s] f each { 1_x,y }\[w#0;s]}
percentile:{[x;y] y (100 xrank y:asc y) bin x}

swin[percentile[99;];500;List].

The issue I come across is that the 99th percentile calculates perfectly, but the 1st percentile makes the entire list = 0. a bit lost as to why it would do that. suggestions appreciated!


Solution

  • What's causing the zeros is two-fold:

    1. What behaviour do you want for the earliest 500 days when there isn't 500 days of history to work with? On day 1 there's only 1 datapoint, on day 2 only 2 etc. Only on the 500th day is there 500 days of actual data to work with. By default that swin function fills the gaps with some seed value
    2. You're using zero as that seed value, aka w#0

    For example a 5 day lookback on each date looks something like:

    q)swin[::;5;1 2 3 4 5]
    0 0 0 0 1
    0 0 0 1 2
    0 0 1 2 3
    0 1 2 3 4
    1 2 3 4 5
    

    You have zeros until you have data, so naturally the 1st percentile will pick up the zeros for the first roughly 500 dates.

    So then you can decide to seed with a different value, or else possibly exclude zeros from your percentile function:

    q)List:1000?1000
    q)percentile:{[x;y] y (100 xrank y:asc y except 0) bin x}
    q)swin[percentile[1;];500;List]
    908 360 360 257 257 257 90 90 90 90 90 90 90 90...
    

    If zeros are a legitimate value in your list and can't be excluded then maybe seed the swin with some other value that you know won't be in the list (negatives? infinity? null?) and then exclude that seed from the percentile function.

    EDIT: A final alternative is to use a different sliding window function which doesn't fill gaps with a seed value, e.g.

    q)swin2:{[f;w;s] f each(),/:{neg[x]sublist y,z}[w]\[s]}
    q)swin2[::;5;1 2 3 4 5]
    ,1
    1 2
    1 2 3
    1 2 3 4
    1 2 3 4 5
    
    q)percentile:{[x;y] y (100 xrank y:asc y) bin x}
    q)swin2[percentile[99;];500;List]
    908 908 908 908 908 908 908 908 908 908 908 959 959..
    q)swin2[percentile[1;];500;List]
    908 360 360 257 257 257 90 90 90 90 90 90 90 90 90..