Search code examples
azure-data-explorerkql

Delta of timestamp within a group is higher than threshold


I have a telemetry table that has list of request id's, time at which certain activity happened for that request. Here is a simplified sequence of activities done for a request: authUser -> processInput -> calculateRate -> returnResponse and the E2E on average takes ~3 seconds.

At times, certain requests are stuck longer in an activity which I need to figure out (output would be list of req_id, activity that took longer than a threshold). The table contains millions of records so if I sort by Req_id and do a prev() to compare delta by Req_id fails due to sorting memory limit. Any better ways to write this query?

Sample input:

Req_Id Req_time Activity
Guid1 2022-12-01T01:01:01 authUser
Guid1 2022-12-01T01:01:01 processInput
Guid2 2022-12-01T01:01:01 authUser
Guid1 2022-12-01T01:01:02 calculateRate
Guid2 2022-12-01T01:01:03 processInput
Guid3 2022-12-01T01:01:03 authUser
Guid2 2022-12-01T01:01:04 calculateRate
Guid3 2022-12-01T01:01:04 processInput
Guid2 2022-12-01T01:01:05 returnResponse
.. .. ..
.. .. ..
Guid3 2022-12-01T01:01:20 calculateRate
Guid3 2022-12-01T01:01:21 returnResponse

Expected output:

input
| where delta_of_activity_duration > 5 second

Req_Id Activity Duration(sec)
Guid3 calculateRate 16

Solution

  • Use the partition operator to do the calculation per Req_Id.

    datatable(Req_Id:string, Req_time:datetime, Activity:string)
    [
        "Guid1" ,datetime(2022-12-01T01:01:01) ,"authUser"
       ,"Guid1" ,datetime(2022-12-01T01:01:01) ,"processInput"
       ,"Guid2" ,datetime(2022-12-01T01:01:01) ,"authUser"
       ,"Guid1" ,datetime(2022-12-01T01:01:02) ,"calculateRate"
       ,"Guid2" ,datetime(2022-12-01T01:01:03) ,"processInput"
       ,"Guid3" ,datetime(2022-12-01T01:01:03) ,"authUser"
       ,"Guid2" ,datetime(2022-12-01T01:01:04) ,"calculateRate"
       ,"Guid3" ,datetime(2022-12-01T01:01:04) ,"processInput"
       ,"Guid2" ,datetime(2022-12-01T01:01:05) ,"returnResponse"
       ,"Guid3" ,datetime(2022-12-01T01:01:20) ,"calculateRate"
       ,"Guid3" ,datetime(2022-12-01T01:01:21) ,"returnResponse"
    ]
    | partition hint.strategy=shuffle by Req_Id
      (
        order by Req_time asc
        | extend duration = (Req_time - prev(Req_time))/1s
        | where duration >= 5
      ) 
    
    Req_Id Req_time Activity duration
    Guid3 2022-12-01T01:01:20Z calculateRate 16

    Fiddle