I am trying to get query-exporter to run in a Docker container. With advice from the developer I have enabled IPv6 in docker by putting:
{
"experimental": true,
"ip6tables": true
}
in my docker daemon.json and restarted.
I am using the following docker-compose file:
version: "3.3"
services:
prometheus:
container_name: prometheus
image: prom/prometheus
restart: always
volumes:
- ./prometheus:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
ports:
- 9090:9090
networks:
- prom_app_net
grafana:
container_name: grafana
image: grafana/grafana
user: '472'
restart: always
environment:
GF_INSTALL_PLUGINS: 'grafana-clock-panel,grafana-simple-json-datasource'
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning/:/etc/grafana/provisioning/
- './grafana/grafana.ini:/etc/grafana/grafana.ini'
env_file:
- ./grafana/.env_grafana
ports:
- 3000:3000
depends_on:
- prometheus
networks:
- prom_app_net
mysql:
image: mariadb:10.10
hostname: mysql
container_name: mysql
environment:
MYSQL_RANDOM_ROOT_PASSWORD: "yes"
MYSQL_DATABASE: slurm_acct_db
MYSQL_USER: slurm
MYSQL_PASSWORD: password
volumes:
- var_lib_mysql:/var/lib/mysql
networks:
- slurm
# network_mode: host
slurmdbd:
image: prom-slurm-cluster:${IMAGE_TAG:-21.08.6}
build:
context: .
args:
SLURM_TAG: ${SLURM_TAG:-slurm-21-08-6-1}
command: ["slurmdbd"]
container_name: slurmdbd
hostname: slurmdbd
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- var_log_slurm:/var/log/slurm
- cgroups:/sys/fs/cgroup:ro
expose:
- "6819"
ports:
- "6819:6819"
depends_on:
- mysql
privileged: true
cgroup: host
networks:
- slurm
#network_mode: host
slurmctld:
image: prom-slurm-cluster:${IMAGE_TAG:-21.08.6}
command: ["slurmctld"]
container_name: slurmctld
hostname: slurmctld
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- slurm_jobdir:/data
- var_log_slurm:/var/log/slurm
- etc_prometheus:/etc/prometheus
- /sys/fs/cgroup:/sys/fs/cgroup:rw
expose:
- "6817"
- "8080"
- "8081"
- "8082/tcp"
ports:
- 8080:8080
- 8081:8081
- 8082:8082/tcp
depends_on:
- "slurmdbd"
privileged: true
cgroup: host
#network_mode: host
networks:
- slurm
c1:
image: prom-slurm-cluster:${IMAGE_TAG:-21.08.6}
command: ["slurmd"]
hostname: c1
container_name: c1
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- slurm_jobdir:/data
- var_log_slurm:/var/log/slurm
- cgroups:/sys/fs/cgroup:ro
expose:
- "6818"
depends_on:
- "slurmctld"
privileged: true
cgroup: host
#network_mode: host
networks:
- slurm
c2:
image: prom-slurm-cluster:${IMAGE_TAG:-21.08.6}
command: ["slurmd"]
hostname: c2
container_name: c2
volumes:
- etc_munge:/etc/munge
- etc_slurm:/etc/slurm
- slurm_jobdir:/data
- var_log_slurm:/var/log/slurm
- cgroups:/sys/fs/cgroup:ro
expose:
- "6818"
- "22"
depends_on:
- "slurmctld"
privileged: true
cgroup: host
networks:
- slurm
#network_mode: host
volumes:
etc_munge:
etc_slurm:
slurm_jobdir:
var_lib_mysql:
var_log_slurm:
grafana_data:
prometheus_data:
cgroups:
etc_prometheus:
networks:
prom_app_net:
slurm:
enable_ipv6: true
ipam:
config:
- subnet: 2001:0DB8::/112
Then installed query-exporter on the slurmctld container and run it with the following config.yaml:
databases:
db1:
dsn: sqlite:////test.db
connect-sql:
- PRAGMA application_id = 123
- PRAGMA auto_vacuum = 1
labels:
region: us1
app: app1
metrics:
metric1:
type: gauge
description: A sample gauge
queries:
query1:
interval: 5
databases: [db1]
metrics: [metric1]
sql: SELECT random() / 1000000000000000 AS metric1
But it is not working - prometheus lists the target as being down:
But the container set-up seems to be fine as if I run the following test exporter:
from prometheus_client import start_http_server, Summary
import random
import time
# Create a metric to track time spent and requests made.
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
"""A dummy function that takes some time."""
time.sleep(t)
if __name__ == '__main__':
# Start up the server to expose the metrics.
start_http_server(8082)
# Generate some requests.
while True:
process_request(random.random())
Prometheus can connect to the target fine:
Can anyone see what the problem could be?
Thanks!
Update
I run query-exporter by hand on the slurmctld container. There isnt anything in the container logs about query-exporter:
2023-07-10 10:11:37 ---> Starting the MUNGE Authentication service (munged) ...
2023-07-10 10:11:37 ---> Waiting for slurmdbd to become active before starting slurmctld ...
2023-07-10 10:11:37 -- slurmdbd is not available. Sleeping ...
2023-07-10 10:11:39 -- slurmdbd is now active ...
2023-07-10 10:11:39 ---> starting systemd ...
I think th etest_query.py that works is using IPv4 on port 8082, while the query exporter is trying to bind IPv6.
docker port slurmctld
gives:
8080/tcp -> 0.0.0.0:8080
8080/tcp -> [::]:8080
8081/tcp -> 0.0.0.0:8081
8081/tcp -> [::]:8081
8082/tcp -> 0.0.0.0:8082
8082/tcp -> [::]:8082
I guess i need to pint prometheus at 8082/tcp -> [::]:8082
when the query-exporter runs, but I'm not sure how to do it.
Running with query-exporter config.yaml -H 0.0.0.0 -p 8082
gets it to work.