Search code examples
monitoringprometheusprometheus-alertmanagercadvisor

Missing labels in prometheus alerts


I'm having issues with Prometheus alerting rules. I have various cAdvisor specific alerts set up, for example:

- alert: ContainerCpuUsage
  expr: (sum(rate(container_cpu_usage_seconds_total[3m])) BY (instance, name) * 100) > 80
  for: 2m
  labels:
    severity: warning
  annotations:
    title: 'Container CPU usage (instance {{ $labels.instance }})'
    description: 'Container CPU usage is above 80%\n  VALUE = {{ $value }}\n  LABELS: {{ $labels }}'

When the condition is met, I can see the alert in the "Alerts" tab in Prometheus, however some labels are missing thus not allowing alertmanager to send a notification via Slack. To be specific, I attach custom "env" label to each target:

 {
  "targets": [
   "localhost:8080",
  ],
  "labels": {
   "job": "cadvisor",
   "env": "production",
   "__metrics_path__": "/metrics"
  }
 }

But when the alert based on cadvisor metrics is firing, the labels are: alertname, instance and severity - no job label, no env label. All the other alerts from other exporters (f.e. node-exporter) work just fine and the label is present.


Solution

  • This is due to the sum function that you use; it gathered all the time series present and added them together, groping BY (instance, name). If you run the same query in Prometheus, you will see that sum left only grouping labels:

    {instance="foo", name="bar"}    135.38819037447163
    

    Other aggregation methods like avg, max, min, etc, work in the same fashion. To bring the label back simply add env to the grouping list: by (instance, name, env).