Search code examples
azure-data-explorerkql

KQL: aggregate row by time shift and get value of the more recent row


I think it's more easy to look at the data. We've an application that tracks all user interaction inside an intranet.

eventType pageUrl timestamp timeOnPageMs
pageEvent https://url1.com/ 2021-11-05T06:10:11.591Z 0
pageEvent https://url1.com/ 2021-11-05T06:20:11.591Z 23123
pageEvent https://url2.com/ 2021-11-05T06:11:11.591Z 0
pageEvent https://url2.com/ 2021-11-05T06:30:11.591Z 23123

Open page event are identified having timeOnPageMs=0, otherwise it's close page event.

I want to define a query that extracts all page open events, but with timeOnPageMs of the page close event.

eventType pageUrl timestamp timeOnPageMs
pageEvent https://url1.com/ 2021-11-05T06:10:11.591Z 23123
pageEvent https://url2.com/ 2021-11-05T06:11:11.591Z 33123

I tried using a UDF that lookups the required value, but it seems not possible as described in last reply here.

Thanks in advance for anyone who'll help!

Giacomo S.S.


Solution

  • You'll need a SessionId to correlate events (to be able to handle cases when there are multiple Open events of the same URL, followed by multiple Close events of the same URL).

    And then this is how you solve it:

    datatable(eventType:string, pageUrl:string, timestamp:datetime, timeOnPageMs:long, sessionId:string)
    [
        "pageEvent","https://url1.com/",datetime(2021-11-05T06:10:11.591Z),0,"id1",
        "pageEvent","https://url1.com/",datetime(2021-11-05T06:10:15.591Z),0,"id2",
        "pageEvent","https://url1.com/",datetime(2021-11-05T06:11:12.591Z),1500,"id2",
        "pageEvent","https://url1.com/",datetime(2021-11-05T06:20:11.591Z),23123,"id1",
        "pageEvent","https://url2.com/",datetime(2021-11-05T06:11:11.591Z),0,"id3",
        "pageEvent","https://url2.com/",datetime(2021-11-05T06:30:11.591Z),23123,"id3"
    ]
    | summarize take_any(eventType, pageUrl), min(timestamp), max(timeOnPageMs) by sessionId
    

    Result:

    sessionId eventType pageUrl min_timestamp max_timeOnPageMs
    id1 pageEvent https://url1.com/ 2021-11-05 06:10:11.5910000 23123
    id2 pageEvent https://url1.com/ 2021-11-05 06:10:15.5910000 1500
    id3 pageEvent https://url2.com/ 2021-11-05 06:11:11.5910000 23123