Search code examples
azureazure-data-explorerkqlkusto-java-sdk

How to summarize time window based on a status in Kusto


I have recently started working with Kusto. I am stuck with a use case where i need to confirm the approach i am taking is right.

I have data in the following format

enter image description here

In the above example, if the status is 1 and if the time frame is equal to 15 seconds then i need to assume it as 1 occurrence.

So in this case 2 occurrence of status. My approach was

if the current and next rows status is equal to 1 then take the time difference and do row_cum_sum and break it if the next(STATUS)!=0.

Even though the approach is giving me correct output, I am assuming the performance can slow down once the size is increased.

I am looking for an alternative approach if any. Also adding the complete scenario to reproduce this with a sample data.

.create-or-alter function with (folder = "Tests", skipvalidation = "true") InsertFakeTrue() {
range LoopTime from ago(365d) to now() step 6s 
| project TIME=LoopTime,STATUS=toint(1)
}

.create-or-alter function with (folder = "Tests", skipvalidation = "true") InsertFakeFalse() {
range LoopTime from ago(365d) to now() step 29s 
| project TIME=LoopTime,STATUS=toint(0)
}


.set-or-append FAKEDATA <| InsertFakeTrue();

.set-or-append FAKEDATA <| InsertFakeFalse();



FAKEDATA
| order by TIME asc
| serialize 
| extend cstatus=STATUS
| extend nstatus=next(STATUS)
| extend WindowRowSum=row_cumsum(iff(nstatus ==1 and cstatus ==1, datetime_diff('second',next(TIME),TIME),0),cstatus !=1)
| extend windowCount=iff(nstatus !=1 or isnull(next(TIME)), iff(WindowRowSum ==15, 1,iff(WindowRowSum >15,(WindowRowSum/15)+((WindowRowSum%15)/15),0)),0 ) 
| summarize  IDLE_COUNT=sum(windowCount)

Solution

    1. The approach in the question is the way to achieve such calculations in Kusto and given that the logic requires sorting is also efficient (as long as the sorted data can reside on a single machine).

    2. Regarding union operator - it runs in parallel by default, you can control the concurrency and spread using hints, see: union operator