Search code examples
dockerdocker-composeprometheusmonitoringmetrics

Prometheus service discovery with docker-compose


I have the following docker-compose file:

version: '3.4'

services:
    serviceA:
        image: <image>
        command: <command>
        labels:
           servicename: "service-A"
        ports:
         - "8080:8080"

    serviceB:
        image: <image>
        command: <command>
        labels:
           servicename: "service-B"
        ports:
         - "8081:8081"

    prometheus:
        image: prom/prometheus:v2.32.1
        container_name: prometheus
        volumes:
          - ./prometheus:/etc/prometheus
          - prometheus_data:/prometheus
        command:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.path=/prometheus'
          - '--web.console.libraries=/etc/prometheus/console_libraries'
          - '--web.console.templates=/etc/prometheus/consoles'
          - '--storage.tsdb.retention.time=200h'
          - '--web.enable-lifecycle'
        restart: unless-stopped
        expose:
          - 9090

        labels:
          org.label-schema.group: "monitoring"

volumes:
    prometheus_data: {}

The docker-compose contain also Prometheus instance with the following configuration:

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.


scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090', 'serviceA:8080', 'serviceB:8081']

ServiceA and ServiceB exposing prometheus metrics(each one on it's own port).

When there is one instance from each service everything works fine but when i want to scale the services and run more than one instance the prometheus metrics collection started to messed up the metrics collection and the data is corrupted.

I looked for docker-compose service discovery for this issue but didn't found suitable one. How can I solve this?


Solution

  • The solution to this problem is to use an actual service discovery instead of static targets. This way Prometheus will scrape each replica during each iteration.

    If it is just docker-compose (I mean not Swarm), you can use DNS service discovery (dns_sd_config) to obtain all IPs belonging to a service:

    # docker-compose.yml
    version: "3"
    services:
      prometheus:
        image: prom/prometheus
    
      test-service:  # <- this
        image: nginx
        deploy:
          replicas: 3
    ---
    # prometheus.yml
    scrape_configs:
      - job_name: test
        dns_sd_configs:
          - names:
              - test-service  # goes here
            type: A
            port: 80
    

    This is the simplest way to get things up and running.

    Next, you can use dedicated Docker service discovery: docker_sd_config. Apart from the target list, it gives you more data in labels (e.g. container name, image version, etc) but it also requires a connection to Docker daemon to get this data. In my opinion, this is an overkill for development environment, but it might be essential in production. Here is an example configuration, boldly copy-pasted from https://github.com/prometheus/prometheus/blob/release-2.33/documentation/examples/prometheus-docker.yml :

    # A example scrape configuration for running Prometheus with Docker.
    
    scrape_configs:
      # Make Prometheus scrape itself for metrics.
      - job_name: "prometheus"
        static_configs:
          - targets: ["localhost:9090"]
    
      # Create a job for Docker daemon.
      #
      # This example requires Docker daemon to be configured to expose
      # Prometheus metrics, as documented here:
      # https://docs.docker.com/config/daemon/prometheus/
      - job_name: "docker"
        static_configs:
          - targets: ["localhost:9323"]
    
      # Create a job for Docker Swarm containers.
      #
      # This example works with cadvisor running using:
      # docker run --detach --name cadvisor -l prometheus-job=cadvisor
      #     --mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock,ro
      #     --mount type=bind,src=/,dst=/rootfs,ro
      #     --mount type=bind,src=/var/run,dst=/var/run
      #     --mount type=bind,src=/sys,dst=/sys,ro
      #     --mount type=bind,src=/var/lib/docker,dst=/var/lib/docker,ro
      #     google/cadvisor -docker_only
      - job_name: "docker-containers"
        docker_sd_configs:
          - host: unix:///var/run/docker.sock # You can also use http/https to connect to the Docker daemon.
        relabel_configs:
          # Only keep containers that have a `prometheus-job` label.
          - source_labels: [__meta_docker_container_label_prometheus_job]
            regex: .+
            action: keep
          # Use the task labels that are prefixed by `prometheus-`.
          - regex: __meta_docker_container_label_prometheus_(.+)
            action: labelmap
            replacement: $1
    

    At last, there is dockerswarm_sd_config which is to be used, obviously, with Docker Swarm. This is the most complex thing of the trio and thus, there is a comprehensive official setup guide. Like the docker_sd_config it has additional information about containers in labels and even more than that (for example, it can tell on which node the container is). An example configuration is available here: https://github.com/prometheus/prometheus/blob/release-2.33/documentation/examples/prometheus-dockerswarm.yml , but you should really read the docs to be able to understand it and tune for yourself.