Search code examples
prometheusprometheus-operatorkube-state-metrics

Scaling kube-state-metrics in prometheus-operator


In Prometheus-operator, I want to increase the kube-state-metrics replicas to 2. If I increase the replicas, and as the default service discovery role is endpoints, Prometheus will scrape each pod so I'll have all metrics scraped twice that will cause many-to-many issues and it's a waste.

The issue I had was a node that went down that had the kube-state-metrics on it among others. I didn't know what was going on my cluster till a new pod was scheduled. It's important for me to have the kube-state-metrics redundant.

How can I configure the kubernetes_sd_configs role for kube-state-metrics to be service so it'll the service as a load balancer and not each pod in the service? OR - how can I scale the kube-state-metrics pods (without sharding)?

Current config:

- job_name: monitoring/prometheus-operator-kube-state-metrics/0
  kubernetes_sd_configs:
  - role: endpoints

What I want:

- job_name: monitoring/prometheus-operator-kube-state-metrics/0
  kubernetes_sd_configs:
  - role: service

Solution

  • Yes, you can.

    While your job that scrapes endpoints is filtering services that include the annotation prometheus.io/scrape: "true" you can choose to use a different annotation for scraping the services themselves.

    Where you have a job like this which scrapes each endpoint individually:

    - job_name: kubernetes-endpoints                                                                                  
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
        - role: endpoints
      relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: "true"
    

    You can add another job, that will only scrape the service as the endpoint:

    - job_name: kubernetes-services
      params:
        module: [http_2xx]
      kubernetes_sd_configs:
        - role: service
      relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
          action: keep
          regex: "true"
    

    Then just make sure you set the correct annotations on the service, like so:

    apiVersion: v1                
    kind: Service                                                                                                     
    metadata:                                  
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/probe: "true"