Search code examples
kubernetesprometheuskubernetes-helmprometheus-alertmanager

Error with external_labels config in alertmanager.yml section of helm Prometheus values.yaml


I've installed prometheus using helm into my kubernetes cluster as follows;

helm list
NAME        NAMESPACE   REVISION    UPDATED                                 STATUS      CHART               APP VERSION
prometheus  prometheus  9           2021-09-07 08:54:54.262013 +0100 +01    deployed    prometheus-14.6.0   2.26.0

I am trying to apply external_labels in the values.yaml to identify the time series sent to Alertmanager. I've used the prometheus docs to get what I believe to be the correct config, as below;

alertmanagerFiles:
  alertmanager.yml:
    global:
      external_labels:
        environment: 'perf'

My installation goes ok;

helm upgrade --install prometheus .

However my prometheus-server pod is crashing due to the following error;

level=error ts=2021-09-06T18:49:25.059Z caller=coordinator.go:124 component=configuration msg="Loading configuration file failed" file=/etc/config/alertmanager.yml err="yaml: unmarshal errors:\n  line 2: fie │
│ ld external_labels not found in type config.plain"

Many of the answers here point to indentation issues, however I can't see what I am doing wrong.. from the Prometheus docs;

global:
  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    [ <labelname>: <labelvalue> ... ]

I have been scratching my head on this for a week or two - would appreciate a second pair of more experienced eyes, thank you! 🙏


Solution

  • I have managed to get this working.. firstly I was putting the configuration in totally the wrong place. I figured this out when looking at the github page for prometheus alertmanager, and I could not see the field defined in the 'good config test', so it must be configured elsewhere..

    Indeed the prometheus config page says so - so I added a section under ## Prometheus server ConfigMap entries;

    serverFiles:
      prometheus.yml:
        global:
          external_labels:
            environment: perf
    

    This did not work either, the pod was crashing. Turns out this should be configured in the part in the values.yaml which configures the prometheus-server container itself - where the top level field = server, and we can see the default global values are also configured here. So I added external_labels into this section;

    server: 
      global:
        scrape_interval: 1m
        scrape_timeout: 10s
        evaluation_interval: 1m
        external_labels:
          environment: perf
    

    When I upgraded using helm upgrade --install prometheus . I can now see the correct config in kubectl get cm prometheus-server -o yaml, plus my Pager Duty alerts are now showing the environment name in the Summary.

    A little side tip on how to test alerts without having to kill pods/create OOMs etc is to create an alert expr: which constantly fires (e.g kube_pod_container_status_restarts_total > 3) which I did by accident but proved to be quite useful.