Search code examples
azure-data-explorerkql

How to create a rolling time window to find max value in Kusto?


Imagine a data set with some time gaps between the records:

datatable(t:datetime , v: int)
[
    datetime(2022-01-01 07:00), 3,
    datetime(2022-01-01 07:15), 2,
    datetime(2022-01-01 07:30), 4,
    datetime(2022-01-01 07:45), 1,
    datetime(2022-01-01 08:00), 5,
    // GAP!
    datetime(2022-01-01 10:15), 8,
    datetime(2022-01-01 10:30), 3,
    datetime(2022-01-01 10:45), 2,
    // ALSO GAP!
    datetime(2022-01-01 11:30), 1,
]

I'm trying to find a max value for each record within previous hour, excluding the current iteration hour. To visualize it, I want to achieve something like that:

t v prev_hr max_v
2022-01-01 07:00 3 2022-01-01 06:00 null
2022-01-01 07:15 2 2022-01-01 06:15 3
2022-01-01 07:30 4 2022-01-01 06:30 3
2022-01-01 07:45 1 2022-01-01 06:45 4
2022-01-01 08:00 5 2022-01-01 07:00 4
2022-01-01 10:15 8 2022-01-01 09:15 null
2022-01-01 10:30 3 2022-01-01 09:30 8
2022-01-01 10:45 2 2022-01-01 09:45 8
2022-01-01 11:30 1 2022-01-01 10:30 3

I've tried modifying the approach suggested in How to create a window of arbitrary size in Kusto? (so using scan() operator) but had problems applying it to the above. Also, I feel like something like time_window_rolling_avg_fl() might be useful here, but it seems complex for a simple use-case like above.

I feel like what I want to achieve is relatively simple and obvious, but I am just missing it.


Solution

  • Check out the suggestion below. Inspired by aggregations over a sliding window.

    datatable(t:datetime , v: int)
    [
        datetime(2022-01-01 07:00), 3,
        datetime(2022-01-01 07:15), 2,
        datetime(2022-01-01 07:30), 4,
        datetime(2022-01-01 07:45), 1,
        datetime(2022-01-01 08:00), 5,
        // GAP!
        datetime(2022-01-01 10:15), 8,
        datetime(2022-01-01 10:30), 3,
        datetime(2022-01-01 10:45), 2,
        // ALSO GAP!
        datetime(2022-01-01 11:30), 1,
    ]
    | extend bin_t = bin(t, 1m)
    | extend _range = range(bin_t, bin_t+1h, 1m)
    | mv-expand _range to typeof(datetime)
    | as T | join kind=inner T on $left.bin_t == $right._range
    | project t, t1, v, v1
    | summarize max_v = maxif(v1, t1 < t) by t, v
    | project t, v, prev_h = t-1h, max_v
    
    t v prev_h max_v
    2022-01-01 07:00:00.0000000 3 2022-01-01 06:00:00.0000000
    2022-01-01 07:15:00.0000000 2 2022-01-01 06:15:00.0000000 3
    2022-01-01 07:30:00.0000000 4 2022-01-01 06:30:00.0000000 3
    2022-01-01 07:45:00.0000000 1 2022-01-01 06:45:00.0000000 4
    2022-01-01 08:00:00.0000000 5 2022-01-01 07:00:00.0000000 4
    2022-01-01 10:15:00.0000000 8 2022-01-01 09:15:00.0000000
    2022-01-01 10:30:00.0000000 3 2022-01-01 09:30:00.0000000 8
    2022-01-01 10:45:00.0000000 2 2022-01-01 09:45:00.0000000 8
    2022-01-01 11:30:00.0000000 1 2022-01-01 10:30:00.0000000 3