I'm trying to produce a Kusto query to measure the "own" duration of the requests (subtracting out durations of dependencies). However, I can't really figure out how to work this out through a pure Kusto query.
To better understand what would would expected, below a sample case:
High level view (where R is the request and Dx the dependencies)
R =============================== (31ms)
D1 ******* (7ms)
D2 ******** (8ms)
D3 ****** (6ms)
D4 ** (2ms)
D5 **** (4ms)
Proj ==*************======******====
D1
overlaps D2
during 2msD5
and D4
shouldn't be taken into account as completely overlapped by other dependenciesProj
being a projection of a potential intermediate step where only meaningful dependencies segments are shownGiven the following testbed dataset
let reqs = datatable (timestamp: datetime, id:string, duration: real)
[
datetime("2020-12-15T08:00:00.000Z"), "r1", 31 // R
];
let deps = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
datetime("2020-12-15T08:00:00.002Z"), "r1", 7, // D1
datetime("2020-12-15T08:00:00.007Z"), "r1", 8, // D2
datetime("2020-12-15T08:00:00.021Z"), "r1", 6, // D3
datetime("2020-12-15T08:00:00.023Z"), "r1", 2, // D4
datetime("2020-12-15T08:00:00.006Z"), "r1", 4, // D5
];
In this particular case, the Kusto query, joining the two data tables, should be able to retrieve 12
(duration of the request, removing all dependencies), ie.
Expected total duration = 31 - (7 + 8 - 2) - (6) = 12
Any help to move this forward would be greatly appreciated <3
I succeeded to solve that using that using row_window_session()
. This is a Window function. You can read more about it at Window functions overview.
The solution is:
let reqs = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
datetime("2020-12-15T08:00:00.000Z"), "r1", 31 // R
];
let deps = datatable (timestamp: datetime, operation_ParentId:string, duration: real)
[
datetime("2020-12-15T08:00:00.002Z"), "r1", 7, // D1
datetime("2020-12-15T08:00:00.007Z"), "r1", 8, // D2
datetime("2020-12-15T08:00:00.021Z"), "r1", 6, // D3
datetime("2020-12-15T08:00:00.006Z"), "r1", 4, // D5
datetime("2020-12-15T08:00:00.023Z"), "r1", 2, // D4
];
deps
| extend endTime = timestamp + totimespan(duration * 10000)
| sort by timestamp asc
| serialize | extend SessionStarted = row_window_session(timestamp, 1h, 1h, timestamp > prev(endTime))
| summarize max(endTime) by operation_ParentId, SessionStarted
| extend diff = max_endTime - SessionStarted
| summarize todouble(sum(diff)) by operation_ParentId
| join reqs on operation_ParentId
| extend diff = duration - sum_diff / 10000
| project diff
The idea here is to sort the entries by the open time, and as long as the next previous end time is later than the current start time, we don't open a new session. Let's explain each line of this query to see how this is being done:
endTime
based on the duration. To normalize the data I'll multiply by 10000 the duration:
| extend endTime = timestamp + totimespan(duration * 10000)
| sort by timestamp asc
timestamp
column. The next two parameters are limits when to start new buckets. Since we don't want to seal a bucket based on time that have passed, I provided 1 hour which will not hit with this input. The forth argument helps us to create a new session based on the data. As long as there are more rows that will result in timestamp > prev(endTime)
they will have the same start time.
| serialize | extend SessionStarted = row_window_session(timestamp, 1h, 1h, timestamp > prev(endTime))
operation_ParentId
to later on join on that key:
| summarize max(endTime) by operation_ParentId, SessionStarted
| extend diff = max_endTime - SessionStarted
| summarize todouble(sum(diff)) by operation_ParentId
req
to get the total starting time:
| join reqs on operation_ParentId
| extend diff = duration - sum_diff / 10000
| project diff
You can find this query running at Kusto Samples open database.
Having said that, please note that this is a linear operation. Meaning that if there are 2 following segments, that should be under the same segment, but they do not intersect, it will fail. For example, adding the following into deps
:
datetime("2020-12-15T08:00:00.026Z"), "r1", 1, // D6
which should not add anything to the calculation, cause it to misbehave. This is because d4
is the previous point, and it has no point of contact with d6
, although d3
covers them both.
To solve that, you need to repeat the same logic of steps 3-5. Unfortunately Kusto does not have recursions, therefore you cannot solve this for any kind of input. But assuming there are no really depth such cases that breaks this logic, I think it is good enough.