Search code examples
prometheuscadvisor

Alert if a docker container stops


I'm monitoring several containers using Prometheus, cAdvisor and Prometheus Alertmanager. What I want is to get an alert if a container goes down for some reason. Problem is if a container dies there is no metrics collected by the cAdvisor. Any query returns 'no data' since there are no matches for the query.


Solution

  • Take a look at Prometheus function absent()

    absent(v instant-vector) returns an empty vector if the vector passed to it has any elements and a 1-element vector with the value 1 if the vector passed to it has no elements.

    This is useful for alerting on when no time series exist for a given metric name and label combination.

    examples:

    absent(nonexistent{job="myjob"}) => {job="myjob"} absent(nonexistent{job="myjob",instance=~".*"}) => {job="myjob"} absent(sum(nonexistent{job="myjob"})) => {}

    here is an example for an alert:

    ALERT kibana_absent
      IF absent(container_cpu_usage_seconds_total{com_docker_compose_service="kibana"})
      FOR 5s
      LABELS {
        severity="page"
      }
      ANNOTATIONS {
      SUMMARY= "Instance {{$labels.instance}} down",
      DESCRIPTION= "Instance= {{$labels.instance}}, Service/Job ={{$labels.job}} is down for more than 5 sec."
      }