Search code examples
kubernetesazure-aksazure-log-analyticsazure-monitoring

Azure Kubernetes Service (AKS) - Pod restart alert


I want to create an alert rule when a pod has restarted. i.e. if the pod restarts twice in a 30 min window

I have the following log analytics query:

KubePodInventory
| where ServiceName == "xxxx"
| project PodRestartCount, TimeGenerated, ServiceName
| summarize AggregatedValue = count(PodRestartCount) by ServiceName, bin(TimeGenerated, 30m) 

But setting the alert threshold to 2 in this case won't work since the PodRestartCount is not reset. Any help would be greatly appreciated. Maybe there is a better approach which I'm missing.


Solution

  • To reset the count between BIN() you can use the prev() function on a serialized output to compute the diff

    KubePodInventory
    | where ServiceName == "<service name>" 
    | where Namespace == "<namespace name>"
    | summarize AggregatedPodRestarts = sum(PodRestartCount) by bin(TimeGenerated, 30m) 
    | serialize
    | extend prevPodRestarts = prev(AggregatedPodRestarts,1)
    | extend diff = AggregatedPodRestarts - prevPodRestarts
    | where diff >= 2
    

    this will output you the right diff over your BIN period.

    TimeGenerated [UTC]         prevPodRestarts diff        AggregatedPodRestarts
    5/12/2020, 12:00:00.000 AM  1,368,477       191,364     1,559,841   
    5/11/2020, 11:00:00.000 PM  1,552,614       3,594       1,556,208   
    5/11/2020, 10:00:00.000 PM  182,217         1,370,397   1,552,614
    

    ref: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/serializeoperator

    https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/prevfunction