Search code examples
azureazure-akskqlazure-data-explorerappinsights

Kusto query for time between records by group in one result list


I have a Debezium instance reading CDC events from a database for several different tables and publishing those messages to the same Kafka/Azure Event Hubs topic. A scaled-out service consumes these messages and logs debug statements with a timestamp and source data table.

Each pod only processes one message at a time - they are single-threaded. I want to know how long it takes a pod to process each type of message for performance evaluation reasons. That is - I want to query for the difference in time between each "Received" log record, grouped by pod, in the consumer service's AKS deployment. With help from this answer I figured it out for individual pods, but I would love to get a view across all the pods with a single query.

Here is an example Kusto query & results for the raw logs I am interested in:

ContainerLogV2 
| where TimeGenerated > ago(2h)
| where PodName startswith "<common pod prefix>"
| where LogMessage contains "received:" 
| project TimeGenerated, LogMessage, PodName
TimeGenerated LogMessage PodName
6/21/2024, 1:34:27 PM Received: {"ID": <integer>, "__dbz__physicalTableIdentifier": "<database alias>.dbo.TABLE_ONE"} <common pod prefix>-<unique pod suffix A>
6/21/2024, 1:33:27 PM Received: {"ID": <integer>, "__dbz__physicalTableIdentifier": "<database alias>.dbo.TABLE_TWO"} <common pod prefix>-<unique pod suffix B>

Based on the question linked above, I was able to get this data for a single pod at a time with the following query:

ContainerLogV2
| where TimeGenerated > ago(2h)
| where PodName startswith "<common pod prefix>"
| where PodName endswith "<unique pod suffix A>"
| where LogMessage contains "received:"
| order by TimeGenerated desc
| extend prev_tg = prev(TimeGenerated)
| parse LogMessage with * "dbo." TABLE_NAME '"}' *
| project TimeGenerated, 
    // LogMessage,
    TABLE_NAME,
    // PodName,
    diff = case(true, abs(prev_tg - TimeGenerated)/1s, double(null))
| where isnotnull(diff)

Which produces data like:

TimeGenerated TABLE_NAME diff
6/21/2024, 1:34:27 PM TABLE_ONE 60
6/21/2024, 1:33:27 PM TABLE_TWO 1
6/21/2024, 1:33:26 PM TABLE_THREE .2538925

From this I can gather messages from TABLE_ONE take a minute to process, so that would be a good target for performance investigation, for example.

Where I am stuck is I do not know how to calculate this difference correctly when the <unique-pod-suffix> filter is removed. I can no longer just calculate the time difference from the previous log record, it has to be the difference from the previous log record from the same pod, since all pods are logging into the same container/table "ContainerLogV2". I don't know how to do that grouping and then flatting back out into a single results table correctly.

Any assistance is appreciated, thanks!


Solution

  • I assume that there is always a pair of Pods to compare with. So always two items of PodA, PodB and so on.

    Then your query could look like this (the datatable was only for me to mock your real data):

    let ContainerLogV2 = datatable(TimeGenerated:datetime, LogMessage:string, PodName:string) [
        datetime('6/24/2024, 1:34:27 PM'), "Received: {\"ID\": <integer>, \"__dbz__physicalTableIdentifier\": \"<database alias>.dbo.TABLE_ONE\"}", "<common pod prefix>-<PodA>",
        datetime('6/24/2024, 1:33:27 PM'), "Received: {\"ID\": <integer>, \"__dbz__physicalTableIdentifier\": \"<database alias>.dbo.TABLE_TWO\"}", "<common pod prefix>-<PodB>",
        datetime('6/24/2024, 1:35:27 PM'), "Received: {\"ID\": <integer>, \"__dbz__physicalTableIdentifier\": \"<database alias>.dbo.TABLE_ONE\"}", "<common pod prefix>-<PodA>",
        datetime('6/24/2024, 1:31:27 PM'), "Received: {\"ID\": <integer>, \"__dbz__physicalTableIdentifier\": \"<database alias>.dbo.TABLE_THREE\"}", "<common pod prefix>-<PodC>",
        datetime('6/24/2024, 2:33:27 PM'), "Received: {\"ID\": <integer>, \"__dbz__physicalTableIdentifier\": \"<database alias>.dbo.TABLE_TWO\"}", "<common pod prefix>-<PodB>",
        datetime('6/24/2024, 3:31:27 PM'), "Received: {\"ID\": <integer>, \"__dbz__physicalTableIdentifier\": \"<database alias>.dbo.TABLE_THREE\"}", "<common pod prefix>-<PodC>",
    ];
    ContainerLogV2
    | where TimeGenerated > ago(1d)
    | where PodName startswith "<common pod prefix>"
    | where LogMessage contains "Received:"
    | order by TimeGenerated desc
    | extend TABLE_NAME = tostring(split(split(LogMessage, "dbo.")[1], "\"")[0])
    | order by PodName desc, TimeGenerated asc
    | extend prev_tg = prev(TimeGenerated)
    | extend PodEqualsPrePod = PodName == prev(PodName)
    | extend Diff = iff(PodEqualsPrePod, abs(prev_tg - TimeGenerated)/1s, double(null))
    | where isnotnull(Diff)
    | project TimeGenerated, TABLE_NAME, Diff
    

    Here is a Fiddle.

    I´ve also improved the parsing of the TABLE_NAME for better performance.