Search code examples
prometheusmetricsopen-telemetry

Opentelemetry target allocator does not allocate discovered ServiceMonitors


I have deployed an Opentelemetry collector using the following:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-metrics
  namespace: otel-metrics
spec:
  mode: statefulset
  targetAllocator:
    enabled: true
    serviceAccount: opentelemetry-targetallocator-sa
    prometheusCR:
      enabled: true
  config: |
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: 'otel-metrics'
            scrape_interval: 30s
            static_configs:
            - targets: [ '0.0.0.0:8888' ]

    exporters:
      logging:
        verbosity: detailed
      prometheus:
        endpoint: "0.0.0.0:8889"
        send_timestamps: true
        metric_expiration: 180m

    service:
      pipelines:
        metrics:
          receivers:
          - prometheus
          processors: []
          exporters:
          - logging
          - prometheus
      telemetry:
        logs:
          level: "debug"

However when I look at the configmap for the collector, I can see that the URL it has created is only looking for jobs for otel-metrics:

collector.yaml: |
  exporters:
    logging:
      verbosity: detailed
    prometheus:
      endpoint: 0.0.0.0:8889
      metric_expiration: 180m
      send_timestamps: true
  receivers:
    prometheus:
      config:
        scrape_configs:
        - http_sd_configs:
          - url: http://otel-metrics-targetallocator:80/jobs/otel-metrics/targets?collector_id=$POD_NAME
          job_name: otel-metrics
  service:
    pipelines:
      metrics:
        exporters:
        - logging
        - prometheus
        processors: []
        receivers:
        - prometheus
    telemetry:
      logs:
        level: debug

If I curl the endpoint http://otel-metrics-targetallocator:80/jobs, I can see that there are a bunch of jobs (e.g. the following), but these are obviously not being picked up by the collector as the URL is only scoping to the otel-metrics jobs. Below is an example of some of the jobs:

"serviceMonitor/monitoring/node-exporter/0": {
  "_link": "/jobs/serviceMonitor%2Fmonitoring%2Fnode-exporter%2F0/targets"
},
"serviceMonitor/cert-manager/cert-manager-service-monitor/0": {
  "_link": "/jobs/serviceMonitor%2Fcert-manager%2Fcert-manager-service-monitor%2F0/targets"
},
"serviceMonitor/monitoring/kube-state-metrics/0": {
  "_link": "/jobs/serviceMonitor%2Fmonitoring%2Fkube-state-metrics%2F0/targets"
},

If I curl just the URL that is injected into the config map, http://otel-metrics-targetallocator:80/jobs/otel-metrics/targets?collector_id=$POD_NAME, I can see that there is only one job:

{
  "otel-metrics-collector-0": {
    "_link": "/jobs/otel-metrics/targets?collector_id=otel-metrics-collector-0",
    "targets": []
  }
}

I am not sure if it is relevant, but the other jobs are from a different namespace than what the otel collector and operator are deployed in. How can I get the target allocator to put all the serviceMonitors under one job so they can be picked up by one collector?


Solution

  • It seems you don't have a target allocator block in your prometheus configuration (see here for an example), adding that will tell your collector to discover those service monitor jobs. We also have alpha support for automatically doing this by setting a featuregate on your operator deployment (example). I think it's probably time that featuregate is moved to beta, I have opened an issue to say exactly that :)

    Specifically, the missing block is:

      config: |
        receivers:
          prometheus:
            config:
              scrape_configs:
              - job_name: 'otel-collector'
            target_allocator:
              endpoint: http://my-targetallocator-service
              interval: 30s
              collector_id: "${POD_NAME}"
    

    setting the target_allocator block is what informs the collector to pull new jobs discovered from servicemonitors from the target allocator