Search code examples
kubernetesstackdrivergoogle-kubernetes-engine

Set up alerting for pod evictions in GKE


I'm encountering a situation where pods are occasionally getting evicted after running out of memory. Is there any way to set up some kind of alerting where I can be notified when this happens?

As it is, Kubernetes keeps doing its job and re-creating pods after the old ones are removed, and it's often hours or days before I'm made aware that a problem exists at all.


Solution

  • GKE exports Kubernetes Events (kubectl get events) to Stackdriver Logging, to the "GKE Cluster Operations" table:

    Next, write a query specifically targeting evictions (the query I pasted below might not be accurate):

    enter image description here

    Then click "CREATE METRIC" button.

    This will create a Log-based Metric. On the left sidebar, click "Logs-based metrics" and click the "Create alert from metric" option on the context menu of this metric:

    enter image description here

    Next, you'll be taken to Stackdriver Alerting portal. You can set up alerts there based on thresholds etc.