Search code examples
azure-data-explorerkqlsummarize

Determine elapsed time between events attended summarized by users in Kusto?


EmpId   Terminal    TimeStamp
1   A   2021-11-16 05:00:15
2   B   2021-11-16 05:00:15
1   B   2021-11-16 06:05:00
2   C   2021-11-16 09:00:15
1   A   2021-11-16 08:00:15
2   B   2021-11-16 11:00:15

**DataTable**
let T = datatable(EmpId:string , Terminal:string, TimeStamp:datetime )
[
   "1", "A", datetime(2021-11-16 05:00:15),
   "2", "B", datetime(2021-11-16 05:00:15),
   "1", "B", datetime(2021-11-16 06:05:00),
   "1", "A", datetime(2021-11-16 08:00:15),
   "2", "B", datetime(2021-11-16 11:00:15),
   "2", "C", datetime(2021-11-16 09:00:15),
];
T   
    | order by TimeStamp asc
    | extend elapsedTime = datetime_diff('minute', next(TimeStamp), TimeStamp)
| summarize Travelled=count(), TerminalT = strcat_array(make_list(Terminal), "->"), TimeStamp=strcat_array(make_list(TimeStamp), "->"),  ElapsTime=strcat_array(make_list(elapsedTime), "->") by EmpId

Expected result:
EmpId Terminal TimeStamp TimeSpentinMins 1 A->B->A 2021-11-16 05:00:15 - >2021-11-16 06:05:00->2021-11-18 08:00:15 65->115 2 B->C->B 2021-11-16 05:00:15->2021-11-16 09:00:15->2021-11-16 11:00:15 240->120

Expected Result

Not getting the expected result as the serialize/order by clause required is reordering the events. Need a way to group by employee id and then order by TimeStamp within each group, so that the elapsed time is caluculated as expected. Is it feasible?


Solution

  • Please see 2 options below. The 2nd should perform better for larger data sets.

    Option #1

    let T = datatable(EmpId:string , Terminal:string, TimeStamp:datetime )
    [
       "1", "A", datetime(2021-11-16 05:00:15),
       "2", "B", datetime(2021-11-16 05:00:15),
       "1", "B", datetime(2021-11-16 06:05:00),
       "1", "A", datetime(2021-11-16 08:00:15),
       "2", "B", datetime(2021-11-16 11:00:15),
       "2", "C", datetime(2021-11-16 09:00:15),
    ];
    T   
    | order by EmpId, TimeStamp asc
    | extend elapsedTime = iff(next(EmpId) == EmpId, datetime_diff('minute', next(TimeStamp), TimeStamp), long(null))
    | summarize Travelled=count(), 
    TerminalT = strcat_array(make_list(Terminal), "->"), 
    TimeStamp=strcat_array(make_list(TimeStamp), "->"),  ElapsTime=strcat_array(make_list_if(elapsedTime, elapsedTime != long(null)), "->") by EmpId
    
    EmpId Travelled TerminalT TimeStamp ElapsTime
    2 3 B->C->B 2021-11-16T05:00:15.0000000Z->2021-11-16T09:00:15.0000000Z->2021-11-16T11:00:15.0000000Z 240->120
    1 3 A->B->A 2021-11-16T05:00:15.0000000Z->2021-11-16T06:05:00.0000000Z->2021-11-16T08:00:15.0000000Z 65->115

    Option #2: using the partition operator

    let T = datatable(EmpId:string , Terminal:string, TimeStamp:datetime )
    [
       "1", "A", datetime(2021-11-16 05:00:15),
       "2", "B", datetime(2021-11-16 05:00:15),
       "1", "B", datetime(2021-11-16 06:05:00),
       "1", "A", datetime(2021-11-16 08:00:15),
       "2", "B", datetime(2021-11-16 11:00:15),
       "2", "C", datetime(2021-11-16 09:00:15),
    ];
    T 
    | partition hint.strategy=native by EmpId
    (
        order by TimeStamp asc
        | extend elapsedTime = datetime_diff('minute', next(TimeStamp), TimeStamp)
        | summarize Travelled=count(), 
        TerminalT = strcat_array(make_list(Terminal), "->"), 
        TimeStamp=strcat_array(make_list(TimeStamp), "->"),  ElapsTime=strcat_array(make_list(elapsedTime), "->") by EmpId
    )
    
    EmpId Travelled TerminalT TimeStamp ElapsTime
    1 3 A->B->A 2021-11-16T05:00:15.0000000Z->2021-11-16T06:05:00.0000000Z->2021-11-16T08:00:15.0000000Z 65->115
    2 3 B->C->B 2021-11-16T05:00:15.0000000Z->2021-11-16T09:00:15.0000000Z->2021-11-16T11:00:15.0000000Z 240->120