Search code examples
kubernetesmonitoringprometheusgrafanagrafana-variable

Grafana Dashboard setup for Prometheus Federation


I am using prometheus federation to scrape metrics from multiple k8s cluster. It works ok and I would like to create some dashboard on grafana which I'd like to filter dashboards by tenant(cluster).. I am trying to use variables but the things that I do not understand, even if ı did not specify something special for kube_pod_container_status_restars_total,it contains the label that I specied below static_configs but kube_node_spec_unschedulable is not.

So where this differences come from and what should I do ? Meanwhile what is the best practice way to setup a dashboard which can provide dashboard filter by multiple cluster name? should ı use relabel?

kube_pod_container_status_restarts_total{app="kube-state-metrics",container="backup",....,tenant="022"}

kube_node_spec_unschedulable{app="kube-state-metrics",....kubernetes_pod_name="kube-state-metrics-7d54b595f-r6m9k",node="022-kube-master01",pod_template_hash="7d54b595f"

Prometheus Server

prometheus.yml:
  rule_files:
    - /etc/config/rules
    - /etc/config/alerts

  scrape_configs:
    - job_name: prometheus
      static_configs:
        - targets:
          - localhost:9090

Central Cluster

  scrape_configs:
    - job_name: federation_012
      scrape_interval: 5m
      scrape_timeout: 1m

      honor_labels: true
      honor_timestamps: true
      metrics_path: /prometheus/federate

      params:
        'match[]':
          - '{job!=""}'
      scheme: https

      static_configs:
        - targets:
          - host
          labels:
            tenant: 012

      tls_config:
        insecure_skip_verify: true

    - job_name: federation_022
      scrape_interval: 5m
      scrape_timeout: 1m

      honor_labels: true
      honor_timestamps: true
      metrics_path: /prometheus/federate

      params:
        'match[]':
          - '{job!=""}'
      scheme: https

      static_configs:
        - targets:
          - host
          labels:
            tenant: 022

      tls_config:
        insecure_skip_verify: true

Solution

  • Central Prometheus server

      scrape_configs:
        - job_name: federate
          scrape_interval: 5m
          scrape_timeout: 1m
    
          honor_labels: true
          honor_timestamps: true
          metrics_path: /prometheus/federate
    
          params:
            'match[]':
              - '{job!=""}'
          scheme: https
    
          static_configs:
            - targets:
              - source_host_012
              - source_host_022
    
          tls_config:
            insecure_skip_verify: true
    

    Source Prometheus (tenant 012)

    prometheus.yml:
      rule_files:
        - /etc/config/rules
        - /etc/config/alerts
    
      scrape_configs:
        - job_name: tenant_012
          static_configs:
            - targets:
              - localhost:9090
              labels:
                tenant: 012
    

    Source Prometheus (tenant 022)

    prometheus.yml:
      rule_files:
        - /etc/config/rules
        - /etc/config/alerts
    
      scrape_configs:
        - job_name: tenant_022
          static_configs:
            - targets:
              - localhost:9090
              labels:
                tenant: 022
    

    If you still don't get needed labels, try to add relabel_configs to you federate job and try to differentiate metrics by a source job name:

    relabel_configs:
      - source_labels: [job]
        target_label: tenant
    

    or extract distinctive information from the __address__ (or from any other __ prefixed) label for example.

    relabel_configs:
      - source_labels: [__address__]
        target_label: tenant_host
    

    PS: keep in mind that labels starting with __ will be removed from the label set after target relabeling is completed.

    https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config