Search code examples
yamlprometheusprometheus-alertmanager

How to correctly configure Alerting yaml rules for Prometheus / Alertmanager


since i'm having a horrid time configuring the Alerting rules for the Prometheus Alertmanager, maybe someone can give me an hint in the right direction.

Here are the rules i'm currently trying to implement (taken straight from: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/)

rules.yml:

groups:
- name: example
  rules:

  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

  # Alert for any instance that has a median request latency >1s.
  - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    annotations:
      summary: "High request latency on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

with the amtool and promtool config check i'm getting the following error:

Checking '/etc/prometheus/rules.yml'  FAILED: yaml: unmarshal errors:
  line 1: field groups not found in type config.plain

amtool: error: failed to validate 1 file(s)

My first guess would be a wrong indentation or some other kind of yaml-syntax error. However i've tried with multiple Alerting rules and also with different files as well as editors (currently i'm using nano).The yaml has also been checked with multiple yaml Linters. But for the time being i've always had errors in the line of the one shown.

Any help or suggestion would greatly be appreciated!

prometheus, version 2.22.2 (branch: HEAD, revision: de1c1243f4dd66fbac3e8213e9a7bd8dbc9f38b2)
  go version:       go1.15.5
  platform:         linux/amd64
alertmanager, version 0.21.0 (branch: HEAD, revision: 4c6c03ebfe21009c546e4d1e9b92c371d67c021d)
  go version:       go1.14.4

yaml linters:

https://codebeautify.org/yaml-validator

https://onlineyamltools.com/validate-yaml

tested alerting rules:

https://onlineyamltools.com/validate-yaml

https://grafana.com/blog/2020/02/25/step-by-step-guide-to-setting-up-prometheus-alertmanager-with-slack-pagerduty-and-gmail/

https://rakeshjain-devops.medium.com/prometheus-alerting-most-common-alert-rules-e9e219d4e949

https://github.com/vegasbrianc/prometheus/blob/master/prometheus/alert.rules


Solution

  • The unmarshal of groups fails because it is supposed to be a list:

    groups:
    - name: GroupName
      rules:
      - alert: ...
    

    See the documentation about recording rules which is the same as the alerting rules.


    UPDATE after post was corrected

    Your file seems to be correct. The command line is:

    promtool check rules /etc/prometheus/rules.yml
    

    I expect you used the command to check the config and not the rules.

    Please note that amtool validates AlertManager's config, not Prometheus'.